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ABSTRACT 


The  principle  of  image  estimation  in  the  presence  of 
linear  and  nonlinear  observations  is  considered  in  this 
dissertation  and  a recursive  estimation  algorithm  is 
developed.  The  development  proceeds  from  the  assumptions 
that  the  image  is  statistically  characterized  by  its 
first  two  moments  namely  the  mean  and  the  autocorrelation 
while  the  observation  is  allowed  to  be  a general  function 
of  the  signal  and  noise.  A two  step  recursive  estimation 
procedure,  compatible  with  the  logical  structure  of  the 
optimal  minimum  mean  square  estimator,  is  developed.  The 
procedure  consists  of  a linear  one  step  prediction  and  a 
filtering  operation. 

In  order  to  derive  the  linear  predictor,  the  a priori 
mean  and  autocorrelation  information  is  employed  to  obtain 
a linear  finite  order  model  of  the  two  dimensional  random 
process.  This  model  is  of  an  autoregressive  form  whose 
derivation  requires  only  the  numerical  values  of  the  mean 
and  the  correlation  functions.  At  each  step  of  the  esti- 
mation, the  autoregressive  model  is  used  in  finding  the 
best  linear  predicted  value  and  its  error  variance  as  a 
function  of  past  estimates  and  their  error  variances. 
Following  the  prediction  process,  the  filtering  operation 
proceeds  to  evaluate  the  estimate  and  its  error  variance 
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as  a function  of  the  predicted  value  and  the  observation. 

The  estimation  method  is  applied  to  a number  of  one 
and  two  dimensional  problems  and  the  appropriate  estimators 
are  developed  for  the  cases  where  the  observation  contains 
additive  and/or  multiplicative  noise  term(s) . The  perfor-  | 

mance  of  the  method  is  evaluated  by  applying  the  estimation 
procedure  to  two  dimensional  pictorial  data  corruptev!  by 
additive-Gaussian  and  multiplicative  uniform  noise. 

The  value  of  the  method  has  been  analyzed  and  dis- 
cussed as  to  its  application  to  practical  problems  and 
its  optimality  as  an  estimation  technique. 
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chapter  1 


introduction 


Progress  in  the  sophistication  and  the  computational 
capability  of  digital  computers  has  opened  a field  of  re- 
search in  the  applied  sciences  dealing  with  the  charac- 
terization, understanding  and  analysis  of  pictorial 
This  field  of  image  processing  encompasses  a variety  of 
areas  of  study  such  as  coding,  recognition,  enhancement, 
restoration,  estimation,  data  compression  and  many  more. 

A particular  subject  of  interest  among  these  is  that  of 
image  estimation.  This  subject  deals  with  the  restoration 
of  images  containing  degradations  where  only  some  statis- 
tical properties  of  both  the  image  and  the  degrading  phe- 
nomenon are  known.  In  this  respect,  a picture  is  generally 
viewed  as  a two  dimensional  random  process  (field)  Hl» 

[6]  and  often  characterized  by  its  first  two  moments, 
namely  the  mean  and  the  autocorrelation.  Denoting  the 
brightness  function  of  the  discrete  image  by  b(i,j),  with 
i and  j as  the  row  and  column  counters,  the  two  moments 

are  defined  as 


M(i.i)  = Eb(i,j) 


(1.1) 


and 
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R(i,j,k,l)  = E [b  (i,  j)-M(i,  j)  ] [b(k,D -M(k,  1)  ] 


(1.2) 


where  E is  the  mathematical  expectation  operator. 

The  list  of  degradation  (noise)  introducing  sources 
in  imaging  systems  is  extensive  and  in  particular  includes 
inaccuracies  in  the  sensing  devices,  the  existence  of  air 
turbulence  or  cloud  layer  between  the  camera  system  and 
the  scene,  reflections  from  other  objects  in  the  scene, 
uncertainties  in  the  transmission  systems  and  film  grain 
fioise.  The  degraded  image  (observation)  , denoted  by 
y(i,j),  specifies  the  functional  relationship  of  the  sig- 
nal, b(i,j)  and  the  noise  y(i,j).  Symbolically 

y(i,j)  = f (b (i, j ) , Y (i, j) ) (1.3) 

where  f may  be  nonlinear  and  y(i»j)  be  vector  valued 

(i.e.  more  than  one  noise  term). 

The  values  of  M(i,j),  y(i,j)  and  R(i,j,k,l),  for  all 
i,j,k,l,  the  functional  form  of  f and  the  density  function 
of  Y(i'j)  (1.3)  constitute  the  a priori  information. 

This  constitutes  the  total  amount  of  information  that  the 
estimation  procedure  is  to  use  in  obtaining  the  improved 
image.  An  estimation  procedure  is  the  process  of  assigning 
a value  to  an  unknown  parameter  based  on  the  noise  cor- 
rupted observations  involving  some  function  of  the  para- 
meter. The  assigned  value  is  called  the  estimate  and  the 
system  yielding  the  estimate  is  called  the  estimator.  The 


2 


assignment  of  the  estimate  values,  in  general,  is  based  on 
certain  criterion  known  as  the  estimation  criterion.  One 
such  criterion  is  that  of  minimizing  the  mean  square  error. 

Optimum  filtering  of  images  under  the  general  condi- 
tion of  (1.3)  has  received  little  attention,  while  a vari- 
ety of  procedures  have  been  developed  for  the  special  lin- 
ear case,  where 

y(i,j)  = b (i , j ) +Y (i, j ) (1.4) 

withY(i,j)  white  and  Gaussian  [32]- [37].  Although  (1.4) 
describes  many  natural  forms  of  degradations  [32] -[37], 
there  are  conceivably  as  many  situations  where  this  model 
does  not  apply.  Examples  of  the  above  are  the  film  grain 
noise  and  the  taking  of  pictures  through  a nonhomogeneous 
layer  of  clouds,  where  the  noise  is  a random  attenuation 
factor.  Hence  the  observation  takes  the  form 

y(i,  j)  = Y(i»  j)b(i»  j)  (1.5) 

The  majority  of  these  linear  estimation  techniques 
require  a rather  specific  analytical  representation  of  the 
correlation  function  R(i,j,k,l)  and  in  order  for  heir 
underlying  estimators  to  become  computationally  efficient, 
the  signal  and  the  noise  processes  are  required  to  be  wide 
sense  stationary.  Due  to  these  requirements,  the  above 
methods  may  not  be  suitable  for  certain  practical  problems. 
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Examples  of  which  are  the  cases  where  the  correlation  func 
tion  is  specified  numerically  and/or  partially  (the  corre- 
lation function  is  represented  partially  if  R(i,j,k,l)  is 
defined  only  for  [i-kl^p  and  |j-l|<q,  for  some  integers 
p and  q)  . A property,  though,  of  the'-v;  procedures,  which 
has  great  intuitive  appeal  and  is  crucial  to  the  real  time 
implementation  of  estimators,  is  the  recursive  nature  of 
their  underlying  algorithms.  This  property  arises  from 
Kalman-Bucy  [23]- [24]  estimation  theory  reviewed  briefly 
in  Section  1.2.  Section  1.3  contains  a review  of  one  di- 
mensional nonlinear  estimation  and  the  extended  Kalman-Bucy 
filtering  methods.  These  techniques,  as  will  be  pointed 
out,  deal  with  certain  nonlinearities  in  estimation  prob- 
lems. 

In  this  dissertation,  a general  estimation  method 
will  be  developed  having  the  following  characteristics: 

1.  The  method  will  be  applicable  to  two  as  well  as  one 
dimensional  estimation  problems. 

2.  The  estima+-ion  algorithm  will  only  require  specifi- 
cation of  the  numerical  values  of  Rfi,j,k,l). 

3.  The  procedure  will  be  applicable  to  problems  where 
only  partial  representation  of  the  correlation  function 
is  available. 

[f  The  method  will  be  applicable  to  general  linear  and 

nonlinear  observation  systems  of  (1.3). 
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The  procedure  will  be  iinpleinen table;  i.e.  numerical 
in  nature  and  computationally  feasible. 

Section  1.1  contains  the  definition  of  some  notations 
and  the  description  of  a convention  which  unifies  the  one 
and  two  dimensional  indexing.  A brief  review  of  one  dimen- 
sional estimation  techniques  is  presented  in  Sections  1.2 
and  1.3. 

The  estimation  method  is  derived  in  Chapters  2 
through  5.  In  Chapter  2,  the  structure  of  the  estimator 
is  developed.  It  is  shown  that  the  general  estimation 
technique  consists  of  modeling,  linear  prediction  and 
filtering  steps. 

The  modeling  problem  is  considered  in  Chapter  3.  A 
general  procedure  is  introduced  which  utilizes  the  a priori 
statistics  and  derives  a linear  autoregressive  model  of 
the  process.  Chapter  4 and  5 contain  pertinent  deriva- 
tions of  the  linear  one  step  predictor  anc  the  filtering 
steps,  respectively. 

The  estimation  process,  as  developed  in  Chapter  2 
through  5,  is  applied  to  a number  of  linear  and  nonlinear 
problems  in  Chapter  6.  This  chapter  also  includes  the 
derivation  of  the  estimator  for  a few  special  cases.  In- 
cluded among  these  are  the  cases  of  additive-Gaussian  and 
multiplicative  uniform  observation  noise.  In  Chapter  7, 
the  proposed  estimation  process  is  analyzed  as  to  its 
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computational  requirements.  Discussion  on  the  optimality 
of  the  estimator  is  presented  in  Chapter  8.  Extensions, 
topics  for  further  research  and  conclusions  are  also  in- 
cluded in  this  chapter. 

Appendix  A contains  a brief  discussion  on  the  error 
variance.  Appendix  B describes  a fidelity  measure  for 
estimated  images. 

1,1  Notations 

An  image  is  viewed  as  an  NxN  matrix  with  elements 
b(i,j),  where  b(i,j)  is  the  intensity  of  the  image  at  pixel 
(i,j).  To  reduce  the  notational  complexity,  the  pixels 
are  indexed  by  1 , 2 , . . . . ,N  ,N+1 , . . . • ,N  cons>.  cutively  from 
left  to  right  and  from  top  to  bottom.  This  convention 
enables  us  to  refer  to  the  doubly  indexed  b(i,j)  as  b(k), 
symbolically.  Hence  (1.1)  through  (1.3)  can  be  written  as: 

M(k)  = E[b(k)  1 

R(k,l)  = E[b(k)-M(k)  nb(l)-M(l)  ] (1-7) 


y (k)  = f (b(k)  ,Y(k) ) 


Let  us  define  the  process  x(k)  as 


I' 


Thus  the  problem  of  estimating  b(k)  reduces  to  estimating 
X (k)  . 


1,2  A Survey  of  Discrete  One  Dimensional  Estimation 


At  a given  time  k and  for  a given  set  of  observations 
y (1) , . . . . ,y  (k) , the  minimum  mean  square  (MMS)  estimate  of 
a random  process  x(k)  is,  by  definition,  the  particular 
value  of  x®(k)  which  minimizes  the  quantity  eMk)  defined 
as  [see  Appendix  A] 


g2(k)  = E[x(k)-x^^k)  ] ^ ly(l) ,y(k) 


(1.10) 


Let  us  denote  this  quantity  by  x*^(k)  . Direct  minimization 
of  cMk)  with  respect  to  x®(k)  yields  [22] 


x^(k)  = Ex(k)  |y(l) ,y(k) 


(i.in 


This  is  a general  result,  in  that,  regardless  of  the  under- 
lying probability  density  functions  (PDF)  of  x(.)  and  y(.), 

the  MMS  estimate  is  given  by  (1.10). 

When  x(k)  is  a normal  random  process  and  processes 
X ( . ) and  y(.)  are  jointly  normal  [22],  then  x*^(k)  in  (1.10) 
will  be  linear  in  y(l), ,y(k),  having  the  form 


x*^(k)  = I a^(k)y(i) 
i=l 


(1.12) 


To  determine  the  constants  a,  (k) , . . . . ,a,  (k) , the  right 

‘ k 

hand  side  of  (1.11)  is  substituted  in  (1.9)  and  e^  (k) 
is  minimized  with  respect  to  a j (k)  » • • • • »aj^  (k)  # resulting 
in  k linear  equations 

E [x  (k) (k)  ] y (i ) = 0 i=l,2,....,k  (1.13) 

collectively  referred  to  as  the  orthogonality  principle. 
This  procedure  of  finding  x^ (k) , though  clear  and  simple, 
is  numerically  inefficient  since  for  each  time  k,a  system 
of  linear  equations  has  to  be  solved  where  the  size  of  the 
system  of  equation  grows  with  k. 

Kalman  and  Bucy  [23]- [24]  have  shown  that  if  the  pro- 
cess x(k)  can  be  generated  by  applying  white  noise  to  the 
input  of  a finite  dimensional  linear  dynamical  system, 
then  the  estimation  process  will  be  recursive  yielding  an 
implementable  and  computationally  simple  estimator.  This 
is  done  if  there  exist  a vector  Z (k)  such  that 

x(k)  = C(k)Z(k)  (1.14) 

with  Z(k)  satisfying  a linear  stochastic  difference  equa- 
tion 


Z(k+1)  = A (k) Z (k) +B (k) u (k) 
and  the  observation  having  the  form 


(1.15) 
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y(k)  = C(k)Z(k)+D(k)v(k) 


(1.16) 


Eu (k)  = Ev  (k)  = 0 


(1.17a) 


Eu(i)u'(j)  = K(i)A(i-j) 


(1.17b) 


Ev(i)v'(j)  = L(i)A(i-j) 


(1.17c) 


where  A(k),  B(k).  C(k).  D(k),  K(k),  L(k)  are  nxn.  nxr,  sxn, 
sxq,  rxr  and  qxq  matrices,  respectively.  The  term  A(i-j) 
is  the  Kronecker  delta  function  and  the  prime  indicates 
matrix  transposition. 

The  estimate  of  x(k)  can  be  obtained  from  the  estimate 
of  Z(k)  through  (1.14).  Denoting  the  MMS  one  step  predic- 
tion value  of  Z (k)  by  Z (k) , then 


z®(k)  = EZ(k) ly  (1) ,y  (k-1) 


(1.18) 


and  the  Kalman-Bucy  linear  estimator  is  given  by  [22]-[24] 


Z^(k+1)  = [A(k)-F(k)C(k)]Z®(k)+F(k)y(k) 


(1.19) 


where 


F(k)  = A(k)P  (k)c' (k)  [C(k)P  (k)c' (k)+D(k)L(k)D  (k)  ] 


(1.20) 


P(k+1)  = [A(k)-F(k)C(k)  ]P  (k)  lA(k)-F  (k)C  (k)  ] 

+B(k)K(k)B' (k)+F(k)D(k)L(k)D  (k)F  (k)  (1.21) 

matrix  P(k+1)  has  the  property  that  for  each  k 


P(k+1)  = EfZ(k+l)-z'’(k+l}l  [Z(k+l)-z”(k+D]  ^ 


(1.22) 


The  ai  jve  results  have  been  presented  in  a summarized 
form  (for  more  detail  see  [20],  [22])  and  are  included  in 

order  to  point  out  the  recursive  nature  of  the  solution 
in  (1.19)  through  (1.21).  This  property  is  conspicuous  in 
(1.19)  where  the  estimate  at  time  k+1,  ZCf(k+l),  is  only  a 
function  of  the  estimate  at  time  k,  Z'^(k)  and  the  observa- 
tion at  time  k,  y(k).  It  is  this  attribute  that  makes  the 
Kalman-Bucy  linear  estimator  easily  implementable  on  digi- 
tal computers. 

1.3  Nonlinear  Estimation  and  Extended  Kalman  Filtering 

The  majority  of  the  existing  nonlinear  estimation 
techniques  are  concerned  with  problems  where  the  system 
and  observation  models  (equations  (1.14)  and  (1.15),  re- 
spectively) are  given  as  [22],  [29] 

Z(k+1)  = f [Z  (k)  ,k]+B(k)u(k)  (1.23) 

y(k)  = g[Z (k) ,k]+D(k)v(k)  (1.24) 

where  f(.)  and  g(.)  are  general  nonlinear  functions. 

An  implementable  nonlinear  estimation  approach,  which 


uses  linearization  in  obtaining  a suitable  procedure  to 
estimate  the  states  of  the  nonlinear  system  of  (1.23),  is 


that  of  the  extended  Kalman  method  [221.  In  this  tech- 
nique, relationships  are  obtained  which  describe  the  be- 
havior of  (1.23)  and  (1.24)  in  the  vicinity  of  a nominal 
solution  z*(k).  The  dynamics  of  the  difference  2(k)-z*(k) 
is  characterized  by  a set  of  linear  equations.  This  char- 
acterization is  achieved  by  assuming  that  f(Z(k),k)  and 
g(Z(k),k)  are  twice  differentiable  in  Z(k)  and  defining 

the  matrices 


A(k)  k 


3fi 

■ 

azj 

azj 

!£i 

!ii 

aZj 

• 

• 

az, 

• 

• 

azj 

agi 

azi 

az; 

39  2 

ag 

5 Z(k)=Z* 

« ’ 


= Z*(k)  (1.25) 


C(k)  = 


3Zn  Z(k)=Z*(k)  (1.26) 


Where  A(k)  and  C(k)  are  used  as  coefficient  matrices  of  a 
linearized  representation  of  (1.23)  and  (1.24)  in  the 
neighborhood  of  (k) . 

It  is  shown  in  (22)  that  the  application  of  the  Kalman 
Bucy  estimation  technique  along  with  the  proper  choice  of 
ZM>),  results  in  a recursive  nonlinear  estimator  of  the 

forr 


Z(k+1)  = f [Z(k)  ,k]+F(k)  ly  (k)-g[Z(k)  ,k]  ] 


(1.27) 


where  F(k)  satisfies  (1.20)  and  (1.21)  with  matrices  A(k) 

and  C(k)  defined  by  (1.25)  and  (1.26). 

Aside  from  the  extended  Kalman-Bucy  technique,  there 

are  other  methods  that  consider  the  models  of  (1.23)  and 
(1.24)  [28],  [29].  These  procedures,  however,  lack  the 

ease  of  implementation  inherent  in  the  linear  Kalman-Bucy 
and  the  extended  Kalman-Bucy  techniques. 
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CHAPTER  2 


ESTIMATION  METHOD 


In  this  chapter,  the  structure  of  the  general  non- 
linear estimator  will  be  developed.  Section  2.1  contains 
the  derivation  of  the  optimal  minimum  mean  square  esti- 
mator. Based  on  the  properties  of  this  estimator,  the 
structure  of  a general  implementable  estimation  procedure 
is  developed  in  Section  2.2. 

2.1  The  Minimum  Mean  Square  Estimation 

For  a given  set  of  obse"vation  y (1)  ,....,  y (k)  , tie 
minimum  mean  square  (MMS)  estimate,  x‘^(k),  of  a process 
x(.)  at  time  k is  obtained  from  [22] 

x^'(k)  = Ex(k)|y(l), ,y(k)  (2.1) 

„2 

Similarly  the  error  variance  of  this  estimate,  o (k) , is 
defined  as 

(k)  = E[x(k)-x'’'(k)  ] ^ ly  (1)  , . . . . ,y  (k)  (2.2) 

Defining  the  set  Y (k)  as 

Y(k)  = {y(D, ,y(k-l)}  (2.3) 
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the  equivalent  form  of  (2.1)  is 


xCT(k)  =/x(k)p(x(k)  |Y(k)  ,y(k)  )dx(k)  (2.4) 

with  p(.)  designating  the  probability  density.  By  applying 
Baye's  rule, 

p(x(k)  ,Y(k)  ,y(k)  ) 

p(x(k)  ]Y(k)  ,y(k) ) = 

p(Y(k)  ,y(k) ) 

p(y(k)  |x(k)  ,Y(k)  )p(x(k)  |Y(k)  ) 
p(y(k)  |Y(k)  ) 

(2.5) 

Since  y (k)  is  defined  as  only  a function  of  x(k)  and  y (k) 
(equation  (1.7)),  where  y (k)  is  independent  of  Y (k) , then 

p (y (k) I X (k) , Y(k) ) = p(y (k)  I X (k) ) (2.6) 

This  simplifies  (2.5)  to 

p(y  (k)  I X (k)  ) p (x  (k)  | Y (k)  ) 

p(x(k)  |Y(k)  ,y(k))  = (2.7) 

p(y(k)  |Y(k)) 

The  substitution  of  the  above  in  (2.4)  yields 


1 

x®’(k)  / X (k)p  (y  (k)  | x(k)  ) p (x  (k)  | Y (k)  )dx  (k) 

p(y(k) |Y(k) ) 

(2.8) 


Furthermore 
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P(y{k)  |Y(k)  ) = 


P(y{k)  ,Y(k) ) /p(y  (k)  ,Y(k)  ,x(k)  )dx(k) 


p{Y(k)) 


p(Y{k)  ) 


p(Y{k)  ) 


|p(y  (k)  |x{k)  ,Y(k)  )p(x{k)  I Y (k)  )p(Y  (k)  )dx(k) 


=/p(y(k)  |x(k)  )p(x(k)  |Y'k)  )dx(k) 


(2.9) 


where  again  (2.6)  has  been  used  to  obtain  (2.9).  Using 
(2.9)  in  (2.8)  yields 


/x(k)p(y(k)  |x(k))p(x(k)  |Y(k))dx(k) 
x°  (k)  — 

/p(y(k)  |x(k))p(x(k)  lY(k))dx(k) 


(2.10) 


Similarly  the  error  variance  of  (2.2)  is  given  by 
2 

{k)=/(x(k)-x'^(k)]2p(>:(k)  |y  (k)  ,y  (k)  )dx(k)  (2.11) 

Substituting  (2.7)  in  the  above/  results  in 


^ 1 / [y  (k)-x^(k)  ] ^p(y  (k)  |x(k)  )p(x(k)  |Y  (k)  )dx(k) 


(k)  = 


p(y(k)  Y(k)) 


(2.12) 


Finally  the  substitution  of  (2.9)  in  (2.12)  yields  the 
form  of  (k)  to  be 


(k)  = 


/[x(k)-x®’(k)J  ^p(y  (k)  |x(k))p(x(k)  |Y(k))dx(k) 


/p(y(k)  |x(k))p(x(k)  |Y(k)  )dx(k) 


(2.13) 


Equations  (2.10)  and  (2.13)  suggest  that  the  optimal 


estimation,  at  time  k,  is  achievred  first  by  finding 

p(x(k)|Y(k))  and  then  using  it  with  y(k)  to  arrive  at 
2 

x'^(k)  and  (k)  . Letting  J<-p(k)  and  Op(k)  denote  the 
optimal  prediction  value  and  its  error  variance  of  x(k), 
respectively,  then  (see  (1.17) 

X (k)  =Ex(k)ly(l) y(k-l)  =Ex(k)jY(k)  (2.14) 

ir 

oMk)  = E[x(k)-x  (k)]MY(k)  (2.15) 

p p ' 

But  X (k)  and  (k)  are  the  mean  and  the  variance,  respec- 
P P 

tively,  of  p(x(k)|Y(k))  in  (2.10)  and  (2.13).  Therefore, 
the  optimal  estimation  at  time  k can  be  thought  of  as  a 
two  step  procedure  depicted  in  Fig.  2.1,  where  the  blocks  ^ 

I 

P and  F may  be  identified  as  the  prediction  and  filtering 

steps,  respectively.  j 


Fig.  2.1 
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In  this  system  structure, y (k)  is  isolated  from  the 
other  random  variables  and  if  p(x(k) |Y(k))  is  known,  con- 
ceptually one  can  deal  with  its  nonlinearities  in  block  F . 

So  if  p(x(kjlY(k))  is  given,  then  the  derivation  of  x^^(k) 
and  0®^^  (k)  is  accomplished  Ly  carrying  out  the  integrations 
in  (2.19)  and  (2.13)  , However,  the  derivation  of  this 
probability  density  for  the  general  observation  system 
of  (1.13)  does  not  lend  itself  to  analytic  methods  and  in 
addition,  available  numerical  approaches  are  computation- 
ally unfeasible  [22]. 

In  the  following  section,  an  altf»rnate  procedure  is 
considered,  whereby  an  approximation  to  the  probability 
density  p(x(k)jY(k))  is  derived.  The  method  is  compatible 
with  the  logic  of  the  optimal  estimator  in  Fig.  2.1,  in 
that  this  logic  consists  of  the  representation  of  past 
information  (i.e.  information  due  to  a priori  statistics 
and  observations  y (1) , . . . . ,y (k-1) ) in  the  form  of  a pro- 
bability density  to  be  combined  with  present  information 
‘I 

(i.e.  Y(k))  in  block  F. 

2.2  Definition  of  the  Proposed  Estimator 

In  order  to  comply  with  the  inherent  logical  structure 
of  the  optimum  estimator  and  at  the  same  time  maintain 
the  algorithmic  implementability , the  proposed  estimator 
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xs  constructed  according  to  the  following  restrictions. 

a)  Only  the  first  two  moments  of  any  random  variable 
are  computed. 

b)  The  prediction  process  is  chosen  to  be  linear. 

c)  The  prediction  is  to  be  based  on  a selected  small 
number  of  past  estimates.  This  will  impose  a 
desired  limited  memory  requirement  for  the 
estima tor . 

In  the  estimation  process,  the  imposition  of  con- 
straint (a)  on  the  estimator  results  in  the  derivation  of 
the  estimate  value  and  its  error  variance  at  each  time  k. 
The  value  of  the  variaxice  represents  a measure  of  uncer- 
tainty of  the  estimate's  numerical  value.  This  constraint 
alleviates  the  problem  of  deriving  or  approximating  the 
probability  density  associated  with  the  estimate.  Due  to 
the  admissibility  of  general  probability  densities  for  the 
observation  noise,  the  lack  of  this  constraint  will  require 
estimation  approaches  similar  to  those  described  in  [22, 
Chap.  7],  which  as  mentioned  before,  are  computationally 
unfeasible . 

Since  the  prediction  process  is  primarily  a learning 
procedure  based  on  the  past  information,  then  the  linearity 
requirement  of  condition  (b)  does  not  violate  the  under- 
lying logic  of  the  optimal  estimator.  Although  this  re- 
quirement, in  general,  results  in  suboptimal  processing, 
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it  enhances  the  implementability  of  the  overall  procedure. 
This  enhancement  is  due  to  the  prope’  ty  of  linear  predic- 
tors being  easily  implementable. 

Having  chosen  a specific  form  of  the  predictor,  con- 
dition (c)  requires  the  estimator  to  be  recursive.  This 
characteristic  is  much  desired  in  estimation  processes 
since  it  simplifies  the  implementation  of  the  estimator. 
Basing  the  learning  process  (prediction)  on  the  past  esti- 
mates is  justified  since  each  estimate  is  obtained  so  that 
it  represents  the  actual  signal  value  with  the  least  amount 
of  uncertainty. 

Letting  the  estimate  and  its  error  variance  at  time  i 
be  represented  by  x(i)  and  aUi)  , respectively,  then  the 
block  diagram  of  Fig.  2.2  represents  the  structure  of  the 

estimator. 


y(k) 


Fig.  2.2 


in  this  figure,  blocks  LP,  F and  D signify  linear 
prediction,  filtering  and  one  uit  time  delay  operate  c.:s, 
respectively.  M is  an  indication  of  the  size  of  the  re- 
quired memory  and  x*  (k)  and  o*Mk)  represent  the  one  step 
predicted  value  and  its  error  variance,  respectively.  The 
set  {k-I, is  a set  of  two  dimensional  indices 

(time)  each  distinct  and  prior  to  k,  i.e. 

k-I.  e {1,2, ,k-l}  i=l,2,....,M 

1 

Figure  2.2  describes  the  structure  of  the  proposed 
estimator  whose  operating  logic  is  derived  in  Chapters  3, 
4 and  5.  In  Chapter  3,  a method  is  introduced  to  derive 
a linear  order  model  of  the  process  x(k).  This  model 
i,s  used  in  Chapter  4 to  derive  the  desired  linear  predic- 
tor. Chapter  5 will  describe  the  derivation  of  the  fil- 
tering step. 
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CHAPTER  3 


MODELING 

In  order  to  derive  the  linear  predictor  (block  LP  of 
Fig.  2.2),  the  a priori  mean  and  the  autocorrelation  infor- 
mation of  the  random  process  x(k)  are  incorporated  into 
a finite  order  linear  model  of  the  form 

x(k)  = I 3iX(k-I^)+Bu(k)  (3.1) 

i=l 

This  model  is  used  in  Chapter  4 to  complete  the  derivation 
of  the  prediction  process.  Since  (3.1)  is  an  autoregres- 
sive model  [15],  [161,  [171,  then  finding  such  a model,  in 
effect,  solves  the  one  step  linear  prediction  problem  in 
the  degenerate  case  where  the  values  of  a sample  function 

of  x(i)  is  specified  for  i=l, 2, . . . . ,k-l . 

Section  3.1  contains  a brief  discussion  of  the  form 
and  the  properties  of  the  autoregressive  models.  In 
Section  3.2,  a procedure  is  introduced  that  finds  the  auto 
regressive  model  associated  with  a given  autocorrelation 
function.  The  derivation  of  the  procedure  is  based  on 
the  a priori  knowledge  of  the  maximum  allowable  order  of 
the  model.  The  discussions  and  guidelines  regarding  the 
best  choice  of  the  maximum  order  is  presented  in  Section 


>*! 


% 


L 


3.3.  Finally,  the  properties  of  the  modeling  procedure 
are  considered  in  Section  3.4. 


3.1  Autoregressive  Models 


A discrete  random  process  x(10  is  represented  by  an 
autoregressive  model  of  order  M{k)  if  for  each  time  k, 
x(k)  satisfies  a linear  stochastic  difference  equation  of 

the  form  [15] 


M(k) 


x(k)  = I 6^(k)x(k-I^)+B{k)u(k) 

i=l 


(3.2) 


where  Mlk)  , B.  (k) deterministic  time 

constants  and  u(l),  u(2) are  a set  of  independent, 

identically  distributed  random  variates  with 


Eu(i)  u(  j)  = A (i-j) 


if  i=^j 
if  i=j 


(3.3) 


. j ■ IT  1 9 M(k)  refer  to  pixels  (time) 

The  indices  k-I^,  i=l,2, 

previous  to  k,  i.e. 


k-I  e {k-l,k-2, ,1>  for  all  i-1,2, ,M(k) 

i 


For  a zero  mean  random  process  x(k),  when  constants 


M(k),  B.(k) invariant 


(i.e.  independent  of  k)  then  x(k)  will  be  wide  sense  sta- 
tionary and  (3.1)  takes  the  form 
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i*4  ^ 


i 


M 

x(k)  = I 3ix(k-I . )+Bu{k) 
i=l 

and  M will  be  called  the  order  of  the  model. 

3.2  Modeling  Procedure 

In  the  following  derivations  the  process  x(k)  is 
assumed  to  be  stationary.  The  extension  of  the  procedure 
to  modeling  of  nonstationary  processes  is  presented  in 
Section  3.4. 

Let  M be  the  given  maximum  order  of  the  model  and 
let  S denote  the  set  of  all  indices  preceding  k,  so  that 


tion  of  E[B.u(k)]^,  thus  the  constants  3?,. 
3 

(3.5)  are  chosen  such  that 


N (s  • ) 

E[B.u(k)]2  = E[x(k)-  I ^ Bjx(k-I^)] 
J i=l  ^ 


(3.6) 


is  minimized.  This  criterion  is  the  same  as  the  minimizing 
the  error  variance  of  the  one  step  predicted  value  since, 
E[BjU(k)]^  is  the  error  variance  associated  with  choosing 


the  predicted  value  of  x(k)  to  be 


N(s  ) . 

I ^ 6?x(k-I^) 
i=l  ^ ^ 


Equation  (3.6)  is  minimized  by  differentiating  its 


right  hand  side  with  respect  to 


N(sj) 


and  setting 


the  result  equal  to  zero.  This  results  in  N(Sj)  linear 


equations  of  the  form 


N(s.) 


E[x(k)-  i ^ Bjx(k-I^) ]x(k-lT ) = 0 1=1,2, ,N(s.) 

i=l  ^ ^ -L  3 

(3.7) 

Carrying  the  expectation  through  in  (3.7),  a system  of 
linear  equations  of  the  form  A3=b  is  obtained  where,  the 
elements  of  matrix  A and  vector  b will  be  in  terms  of  the 
values  of  the  correlation  function  R(m,n)  (see  example 


3.1).  Solving  this  system  of  equations  defines  the  quan- 


tities 3^,  i=l, 2, . . . . ,N (Sj ) . These  quantities,  in  turn. 
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■j 

J 


r 


define  the  values  of  (Bj)^  given  as 


N{s.)  . 


(B.)2  = E[B.u(k)l2  = E[x(k)-  i ^ e?x(k-I^)] 
J ] i=l  i 


(3.8) 


Repeating  this  procedure  for  all  subsets  Sj, 
the  quantities 


(Bi)2^  (B^)^, 


(3.9) 


are  obtained.  The  model  of  the  random  process  x{k)  is 
chosen  to  be  the  autoregressive  form  associated  with  sub- 


set Sjj,  such  that  (B^^)  ^ is  the  minimum  of  the  quantities  in 


(3.9).  If  the  minimum  is  not  unique,  then  the  model  is 


chosen  to  be  that  of  Sj^^  where  (Bj^)  ^ is  a minimum  of  (3.9) 


and  N(s_)^N(s  ) if  (B^) ^ is  any  other  minimum.  The  fol- 


'm''''""=p'  '“p' 

lowing  example  is  prov.'.ded  to  clarify  some  of  the  above 
deriva  t Ujmg  . 


Example  3.1: 

Letting  k correspond  to  two  dimensional  index  (i,j) 

and 

Sj  = { (i, j-1) , (i-1, j) , (i-1, j-1) } 


(In  figure  3.1,  0 represents  (i,j)  and  □ represents  ele- 


ments of  Sj  on  a two  dimensional  grid)  then  (3.5)  yields 


x{i,j)  = B^x(i,j-l)  + B2x(i-l,j ) +3^x  (i-1, j-1 ) +BjU (i, j ) 

(3.10) 
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Fig.  3.1 


Applying  (3.7)  and  assuming  (stationari ty ) 

Ex(i, j)x(k,l)  = R(|i-k|,| j-l|)  (3  11) 

The  system  of  linear  equations  for  6i»  3^,  3^  will  become, 


R(0,0)  R(l,l)  R(1,0)|  3? 


R(0,1) 


R(l,l)  R(0,0)  R(0,1)  = R(l,o) 


R(1,0)  R(0,1)  R(0,0)  3^ 


R(l,l)  (3.12) 


Solving  (3.12)  results  in  the  values  for  3^,  3^,  3^  and 
these  values  result  in 


(B.)  2=E[BjU(i,  j)  ] 2 

=E[x(i,j)-3^x(i,j-l)-3ix (i-1, j)-3^x(i-l, j) ] 

=R(0,0)-3^R(0,1)-3^R(1,0)-3^R(1,1) 

For  any  other  subset  s^,  the  above  procedure  is  repeated 
and  (B^) ^ in  (3.9)  is  obtained.  Note  that  in  finding 
(Bj)^  from  (3.13)  and  (3.12)  only  the  numerical  values  of 


of  R(m,n)  are  required. 

Having  defined  a general  modeling  procedure,  the 
following  theorem  establishes  its  applicability  to  special 
cases . 

Theorem  3.1:  If  a zero  mean,  normal  random  process  x(k) 

satisfies  an  autoregressive  model 

x(k)  = a^x  (k-I^)+Bu  (k)  (3.14) 

i=l 

where  then  the  modeling  procedure  will  yield  the  same 

model . 

Proof:  Let 

s = ^(k“Ij),  (k— I j),....,  (k—Ij^)  } 

let 

where  is  an  index  (time)  preceding  k and  not  includ- 

ed in  s,  i.e.  q^s 
let 


s j = sUq  = { (k-I  j ) , (k— 1 2),...-»(k-Ijjj),  ^ ^ 

Using  the  modeling  procedure  to  find  the  autoregressive 
form  for  Sj,  results  in 


f h: 


t • • • • t 


3 are  obtained  such  that 

M +1 


and  3j 


E[x  (k)-Xp(k)  ] 


is  miniir.ized,  where  (see  (3.6)) 
M +1 

X (k)  = I 6^x(k-I  ) 


(3.17) 


But  due  to  the  Gaussian  nature  of  x(k),  the  quantity 


Ex(k)  |x(k-Ii  ) 


(3.18) 


is  linear  in  x(k-I,) and  if  substituted  for 

X (k) , it  minimizes  (3.16)  [22].  Therefore 

P 

M +1 

Ex(k)i  x(k-I,)  = I BiX(k-I^) 


Also,  using  the  conditioning  of  (3.18)  in  (3.14)  results  in 


Ex  (k)  1 X (k~I  j)  ,....,x  + 


= ^ (X  • Ex  ( k~  I . ) 1 X (k~  Ij ) , . . • • + 

i=l  ^ 


+BEu  (k)  X (k“  Ij)  ,.«««»x  +1^ 


(3.20) 


Eu(k)  lx  (k-Ij ) + " Eu(k)  - 0 


Therefore  (3.20)  becomes 


Ex (k) 1 X (k-I j ) , 


But  the  conditional  expectation  in  (3.18)  is  a specific 
linear  function  of  x (k— I |)/....»x  +1  ^ * thus  the  com- 

parison of  (3.19)  and  (3.21)  necessitates  that  in  (3.19) 


6i  = 


Oi  if  i<M 


if  i=M  +1 


This  is  turn  indicates  that  in  (3.15) 


n,  = B 


Now,  by  letting  the  set  q be  empty,  then  the  same 
procedure  will  indicate  that  (3.15)  becomes  identical  to 
(3.14)  and  by  letting  q contain  m elements  with  M +m^M, 
then  (3.22)  and  (3.23)  will  become 


^ = 


if  i4M 


if  i>M 


m 


This  completes  the  proof. 


3.3 


Choice  of  the  Model's  Order 


The  modeling  procedure  of  Section  3.2  is  particularly 
useful  in  the  cases  where  only  a small  number  of  correla- 
tion values  are  specified.  In  these  cases,  an  upper  bound 
for  the  value  of  M exists.  If  this  upper  bound  is  suf- 
ficiently small,  then  its  value  should  be  used  to  represent 

the  order  of  the  model. 

Since  the  modeling  criterion  is  taken  to  be  the  mini- 
mization of  the  model's  noise  variance  then,  in  general, 

M should  be  selected  on  the  basis  of  the  rate  of  decrease 
of  (B^)2  as  a function  of  M.  This  idea  is  applied  and 

explored  further  in  example  3.2. 

It  will  be  shown  in  Chapters  4 and  5 that  the  com- 
plexity and  the  memory  requirement  of  the  proposed  esti- 
mator will  be  a direct  function  of  the  value  of  M.  A 
logical  consideration  in  the  choice  of  M,  therefore,  is 
the  trade  off  between  additional  implementation  complexity 

and  the  reduction  of 
Example  3.2; 

Consider  the  stationary  two  diemnsional  correlation 


function 


J 


R(i,j,k,l)  = R(|i-k|, 1 j-l|)  = Ex(i,j)x(k,l) 


= exp[-/(i-k)  ^+(  j-1)  M 


Application  of  the  modeling  procedure  yields: 

a)  Best  2nd  order  model  is 

x(i, j)=0.3x(i, j-l)+0.3x(i-l, j)+0.883u(i, j) 

b)  Best  3rd  order  model  is 

x(i, j)=0.29x(i, j-l)+0.25x(i-l, j)+0.12x(i-l, j+1) 
+0.8775U (i, j) 

c)  Best  4th  order  model  is 

x(i, j)=0.28x(i, j-l)+0.24x(i-l, j)+0.03x(i-l, j-1) 
+0.12x(i-l, j+l)+0.8769u(i, j) 

d)  Best  5th  order  model  is 

x(i, j)=0.28x(i, j-l)+0.24x(i-l, j)+0.03x(i-l, j-1) 
+0.11x(i-l, j+l)+0.02x(i-l, j+2)+0.8768u(i, j) 

This  indicates  that  additional  complexity  of  going  from  the 
3rd  to  the  5th  order  does  not  reduce  (B^) ^ appreciably. 
Hence,  for  example,  to  a third  decimal  place  accuracy, 

3rd  order  model  is  a sufficient  approximation. 


3.4  Properties  of  the  Modeling  Procedure 


Derivation  of  the  autoregressive  model  has  been 
based  on  minimizing  the  uncertainty  associated  with  pre- 


t-' 


dieting  the  present  value  of  a sample  function  of  a 
process  by  a linear  combination  of  a finite  number  (M) 
of  its  past  values  [16]  . This  is  in  accord  with  the  con- 
cept of  Fig.  2.2.  The  modeling  ^ -ocedure  is  directly 
applicable  to  nonstationary  problems  where  for  these 
cases  the  procedure  must  be  applied  at  each  time  k,  re- 
sulting in  one  autoregressive  form  for  each  k. 

A property  of  the  procedure,  which  is  of  extreme  prac- 
tical value,  is  that  the  determination  of  the  model  is  only 
a function  of  the  numerical  values  of  R(m,n)  and,  in  fact, 
is  independent  of  the  analytical  form  of  R(m,n).  This, 
in  turn,  enhances  the  numerical  and  computational  character 
of  the  estimation  process. 

The  optimality  of  the  modeling  procedure,  in  the 
cases  where  the  process  x(k)  has  a corresponding  autore- 
gressive model  of  order  less  than  M,  is  established  by 
theorem  3.1.  However,  by  applying  this  method  a model  of 
the  form  (3.1)  can  always  be  found  even  when  a finite  order 
autoregressive  model  does  not  precisely  describe  the  cor- 
relation information,  or  if  the  exact  model  is  of  an 


order  higher  than  the  chosen  M.  Whatever  the  case,  it 
should  be  noted,  the  correlation  function  generated  by 


I 


Consider  the  stationary  one  dimensional  random  pro- 


cess z(k)  with 


R(i,j)  = Ez(i)z{j)  = R(li-j 


Let  M=2  and  the  model  of  z{k)  be  given  by 


x(k)  = e,x(k-l)+32x(k-2)+Bu(k) 


The  system  of  linear  equations  that  and  3j  must  satisfy, 


becomes  (see  Example  3.1) 


R(0)  R(l)  3, 


R(l)  R(0)  3 


R(l) 


R(2) 


which  results  in 


R(0)R(1)-R(1)R(2) 


3,  = 


R^ (O)-R^ (1) 


R(0)R(2)-R^  (1) 


3,  = 


R^  (O)-R^  (1) 


Accordingly 


= E[x(k)-3,x(k-l)-32x(k-2)  ] 


= R(0)-3,R(1)-32R(2) 


Letting 


C(li-jl)  = Ex(i)x(j)  (3.31) 

be  the  stationary  correlation  function  generated  by  (3.27), 
then 

C(0)  = ExMk)  = E[B,x(K-l)+&2x()c-2)+Bu(k)  ] ^ 

= (bJ+B2)C(0)+B^+2B,B2C(1)  (3.32) 

C(l)  = E(x(k)x(k-1)  = E[B,x(k-l)+B2x(k-2)+Bu(k)  ]x(k-l) 

= B,C  (0)+B2C(1)  (3.33) 

C(2)  = Ex(k)x(k-2) 

= E[BiX(k-l)+B2x(k-2)+Bu(k)  ]x(k-2) 

= B,C(1)+B2C(0)  (3.34) 

Substituting  (3.28)  through  (3.30)  in  (3.32),  (3.33)  and 
(3.34)  and  solving  for  C(0),  C(l),  C(2)  yields 

C(0)  = R(0) 

C(l)  = R(l)  (3.35) 

C(2)  = R(2) 

This  indicates  that  the  correlation  function  generated 
by  (3.27)  matches  R(i,j)  at,  at  least,  3 points. 
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CHAPTER  4 


LINEAR  PREDICTION 

In  this  chapter  it  is  assumed  that  the  model  of  the 
random  process  x(k)  is  derived  to  be  of  the  form 
M 

" ^|^3^x(k-I^)+Bu(k)  . 


and  the  process  x(k)  is  assumed  to  be  stationary.  Deri- 
vation of  the  linear  predictor  for  nonstationary  processes 
is  identical  to  that  of  stationary  processes,  hence  is 
omitted.  Section  4.1  contains  a brief  discussion  on  t*^e 
optimal  linear  one  step  prediction  and  the  difficulties 
associated  with  implementing  such  a procedure.  The  devel- 
opment and  derivation  of  an  implementable  one  step  pre- 
dictor compatible  with  the  proposed  system  structure  of 
Fig.  2.2  is  presented  in  Section  4.2.  Section  4.3  contains 

the  derivation  of  the  variance  of  the  one  step  predicted 
value . 


4.1  Optimal  Linear  One  Step  Prediction 

Denoting  the  HMS  one  step  prediction  value  of  x(k) 
by  x'’(k),  then 
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(4.2) 


xP(k)  =Ex(k)ly(l). ,y(k-l) 

conditionln,  the  right  hand  side  of  (4.1)  on  yd)  , . . .y(h-D 
and  taking  the  expectation  results  in 


^P,k)=?  6.Ex(k-Ii)ly(l) y()c-l)tBEu()0ly(l)....y(k-l) 

i=l  ^ (4.3) 


But  due  to  the  s 


tatistical  independence  and  the  zero  mean 


property  of  u(k) 


Eu(k)  ly(l) y(k-l)  - Eu(k)  - 0 


(4.4) 


hence 


xP(k)  = 


I B Ex(k-i.)  ly(i) ,y(k-i) 

i ^ 


(4.5) 


The  difficulty  in  finding  xP()0  in  (4.5)  is  that  at 
each  pixel  k,  the  M expectations 


X 1 /II  vfk-1)  i=l,  2, /M  (4.6) 

Ex(k-i^)  ly (1) ' • • • • 


• A Performing  this  task  involves 

have  to  be  carried  out.  Fertormxny 


interpolation  of  the  random  process  x(k),  which  in  the 
simplest  case  is  computationally  unfeasible.  To  maintain 
implementability.  a suboptimal  recursive  prediction  pro- 
cedure is  introduced  in  the  following  section. 
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4.2 


A Linear  One  Step  Prediction 


In  accordance  with  the  discussion  of  Chapter  2 and  the 
system  structure  of  Fig.  2.2,  the  linear  predictor  is 
to  be  based  on  past  estimates  and  their  error  variances. 

In  this  respect,  the  one  step  predicted  value  x* (k)  is 
given  by 

k-1  . 

X* (k)  = L a.x (k-j)  (4.7) 

j=l  ^ 

where  '“k-1  chosen  such  that 

E(x(k)-x* (k) ] ^ (4.8) 

is  minimized.  The  above  minimization  is  to  be  carried  out 
based  on  the  available  information  to  the  predictor.  This 

information,  in  turn,  consists  of  the  values  of  x(i)  and 

2 ^ ^ 2 , 

o (i) f i^k-1.  Furthermore,  each  x(i)  and  a (i)  represents 

the  mean  and  variance,  respectively,  of  a posteriori  pro- 
bability density  of  x(i)  at  time  i.  This  interpretation 

A 

is  directly  substantiated  by  the  way  each  estimate  x(i) 
at  time  i is  obtained  as  a function  of  the  previous  esti- 
mates and  the  observation  y(i). 

As  a result  of  the  above  discussion,  the  expectation 
in  (4.8)  is  well  defined  and  operates  on  each  random 
variable  x(i)  such  that 
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Ex(i)  = x(i) 


i=l,2, ,k-l 


(4.9a) 


E[x(i)-x(i)  1 = 0 (i) 


i=1.2, ,k-l 


(4.9b) 


Having  (4.9a)  and  (4.9b)  as  the  definition  of  the  expecta- 
tion operator,  the  following  theorem  establishes  the 
optimal  linear  predictor. 

Theorem  4.1:  When  the  random  process  x(k)  satisfies  a 

model  of  the  form  (4.1),  then  the  (optimal)  choice  of 

“lc-1  minimizes  (4.8),  is  given  by 


if  k-j=k-I^  for  some  i 


— 1,2,....,^ 


]o  otherwise 


Proof ; 


substitution  of  (4.1)  and  (4.7)  in  (4.3)  yields 


Elx(k)-x*(k)i"=ttBu(k)+  I S.:-(k-I.)-.I  Ojk(k-j))  ( 

i=l  D"-*- 


4.10) 


Since  k- 
indices 


-I.,  i=l,2,....,M  is  a set  of  two  dimensional 
and  their  particular  values  are  immaterial  to  this 


proof,  assume 


I =1 
i 


(4.11) 


in  order  to  reduce  notational  complexity.  With  this. 
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(4.10)  then  becomes 


If: 


2 M k-1 

E[x(k)-x* (k) ] =E[Bu(k)+  I B.x(k-i)-  I a.x(k-j)] 

i=l  ^ j=l  ^ 


I 


M 


k-1 


=E(Bu(k)]  +E{  I B.x(k-i)-  I a x(k-j)] 
i=l  ^ j=l  ^ 


M 


k-1 


+2E{[Bu(k)l(  I B.x(k-i)-  I a.x(k-j)]} 
i=l  ^ j=l  J 


(4.12) 


But  (4.9)  and  the  statistical  independence  of  u(k)  imply 
that 


2 2 

E(Bu(k)l  = B 


(4.13) 


M 


k-1 


E{[Bu(k)][  y B.x(k-i)-  I a.x(k-j)]} 
i=l  ^ j=l  ^ 


= 0 


(4.14) 


Substitution  of  (4.13)  and  (4.14)  in  (4.12)  yields 


2 2 ^ 

E[x(k)-x* (k) ] =B  +E[  I g.x(k-i)-  I a.x(k-j)] 

i=l  j=l  ^ 


M M ^ ^ 

+E[  I B.x(k-i)-  I a._.x(k-j)-  I a.x(j)] 
i=l  ^ j = l J j=l  J 


=B 


(4.15) 


Since 
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M M ^ 

I a x(k-i)=  I [Bi+ntk_i-3i]x(k-i) 
i=l  k-i  i=l 


M M 

= y 3-x(k-i)+  y .-B.)x(k-i) 

.'',1  k-i  1 


(4.16) 


then,  substitution  of  (4.16)  in  (4.15)  and  the  expansion 
of  the  square  term  results  in 


2 

E[x(k)-x*  (k)  ] 


k-M-1  - 

+E[  y a.x(i)] 
i=l  ^ 


a . X ( i)  ] 
1 


-3.  )x(k-i) 
1 


M 

+ 2E[  I («!-_  .-3.  )x(k-i)  1 I 

• — 1 i X 


k-M- 


I 

i=l 


1 


a .x(i) ] 


(4.17) 


But  in  (4.17)  x(i),  i-1,2. 


,k-l  are  a set  of  nonrandom 
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quantities.  This  along  with  (4.9a)  indicates  that 


•'j[ 

h 


M 


M 


E[  I 3-  (x  (k-i)-x(k-i)  ) ] [ I (a  .-B.)x(k-i)] 
i=l  ^ i=l  ^ 


M 


M 


[ I 6;  (Ex(k-i)-x(k-i) ) ] I I (a  .-6.)x(k-i)] 
i=l  i=l  ^ 


M 


M 


= [ I 6-  (x(k-i)-x(k-i) ) ] [ I (a.  .-B.)x(k-i)]  = 0 

i=l  ^ i=l  ^ 


(4.18) 


and  similarly 
M 


k-M-1 


E[  I B. (x (k-i) -X (k-i) ) ] [ I a.x(i)] 
i=l  ^ i=l  ^ 


= 0 


(4.19) 

Furthermore,  realizing  that  the  third,  fourth  and  seventh 
terms  of  (4,17)  are  nonrandom,  use  of  (4.18)  and  (4.19) 
reduces  (4.17)  to 


E(x(k)-x* (k) ] 


M 


M 


=B  +E[  I 6.  (x(k-i)-x(k-i) ) ]+[  I (a  .-B.)x(k-i)] 
i=l  i=l  ^ 


k-M-1 


2 M 


k-M-1 


+ [ I a.x(i)]  +2[  I (a  .-$■) x{k-i) ] [ I a.x(i)] 
i=l  ^ i=l  i=l  1 


or 
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tSii 


E[x(k)-x* (k) ] 


=B  +E[  I e.  (x(k-i)  -x(k-i)  ) ] 
i=l 


M -V  k-M-1  - 2 

+ [ I [a  . -3  J x(k-i)-J-  I a.x(i)] 
i=l  ^ i=l  ^ 


(4.20) 


The  first  two  terms  of  (4.20)  are  independent  of 

i=l , 2 ,...., k-1 , The  third  term  is  a complete  square 
and  its  minimum  is  zero.  Therefore,  the  minimum  of  (4.20) 
is  achieved  if 


3^  for  i=k-l ,k-2 ,...., k-M 

“i  " 

0 for  i=l, 2, k-M-1 


(4.21) 


This  completes  the  proof  of  Theorem  4.1. 

The  foregoing  theorem  establishes  that  the  best  linear 
predictor  is  given  by 


x*(k)  = I 3.x(k-I.) 
i=l  ^ 


(4.22) 


which  yields  itself  to  on  line  implementation  and  satisfies 
the  finite  memory  requirement  of  Chapter  2. 


4.3 


Error  Variance  of  Predicted  Value 


Letting  o (k)  denote  the  error  variance  of  the  pre- 
P 


dieted  value  x* (k)  at  time  k,  then 


(k)  = E [x (k)-x*  (k) ] 
P 


(4.23) 


Substituting  for  x(k)  and  x* (k)  in  (4.23)  from  (4.1)  and 
(4.22),  respectively,  yields 


2 ^ ^ 
a (k)--E  [Bu(k)+  I 6.x(k-I.)-  ),  3^x{k-I. 
P i=l  ^ ^ i=l 


) ] 


’-b\e{  I B.  [x(k-I.  )-x(k-I  .)  ] }* 
i=l 


(4.24) 


where  in  deriving  (4.24),  use  of  the  statistical  indepen- 
dence and  the  zero  mean  property  of  u(k)  is  made.  Let 


(k-I^)  = x(k-I^)-x(k-I^) 


(4.25) 


then  at  each  time  k-I^,  e(k-I^)  is  a random  variable  whose 
fir^t  two  moments  are  (see  (4.9a)  and  (4.9b) 


Ee(k-Ij^)  = 0 


2 


Ee  (k-I^)  =0  (k-Ij^) 


(4.26) 


(4.27) 


Expansion  of  (4.24)  in  terms  of  e(.)  as  defined  in  (4.25) 


results  in 


} 

i 


22”  " 

0 (k)=B  +E[  I B.e(K-I^) 1 

P i=l  1 

=B^BiEeNk-Ij)  + 

+ 26iB2Ee(k-Ii)e(k-l2)  + . • .+2B,  B^^Ee  (k-I  j ) e (k-I^) 

2 

This  relation  shows  that  the  evaluation  of  Op(k) 

requires  the  knowledge  of  cross  covariances  of  the  random 

variables  e(k-I.),  i=l, 2, . . . . ,M.  To  avoid  the  numerical 

1 

difficulties  associated  with  the  evaluation  of  these  cross 
covariances  at  each  time  k,  the  following  upper  bound  of 
a'(k),  denoted  by  a*'(k),  is  derived  and  used  in  the  fil- 
tering step  of  the  next  chapter.  The  reason  for  finding 
an  upper  bound,  as  opposed  to  a lower  bound,  is  due  to  the 

fact  that  (k)  is  a measure  of  uncertainty  of  the  value 

P 2 

x*(k),  thus  by  assigning  value  a*  (k)  to  x*  (k)  the  uncer- 
tainty associated  with  x* (k)  is  increased.  This  causes 
the  es*-imation  process  to  remain  suboptimal  and  is  dis- 
cussed in  more  detail  in  Chapter  8. 

Taking  the  absolute  values  of  the  right  hand  side  of 
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(4.28)  and  using  (4.27)  results 


in 


Op(k).$B%l  6,  I (k-Ij)  + + |3^|  0^ 


+ 2l6i I I62 1 lEe(k-Ij)e(k-l2)  | + 


+2IB2I IB3I  |Ee(k-l2)e(k-l3)  [ + 


(4.29) 


But  for  each  pair  of  random  variables  e(k-I^)  and  e(k-Ij), 
Caachy-Shwartz  inequality  f30]  establishes  that 


>/2 


Ee(k-I . )e(k-I . ) U (Ee  (k-I.)Ee  (k-I.)]  =0  (k-I 0 (k-I . ) 

1 D ID  ^ 

(4.30) 


so 


2^2 


0 (k)($B  +|6 J o (k-Ij)  + , 


,+  |B  1 0 (k-i  ) 
m'  M 


+2lSi  I IB2 lo(k-I,)o(k-l2)  + , 


«|B„.^lle^|o(k-i„_i);(k-y 


(4.31) 


where  (4.31)  is  equivalently  written  as 


o'  (k).<B'+[  I IB.  lo(k-I,)]  (4.32) 

P i=l  ^ ^ 
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Hence,  the  upper  bound  a*  (k)  is  given  by 


2 2 M 2 

0*  (k)=B  ■¥[  I la.  |o(k-I.)] 


(4.33) 


i=l 


It  should  be  noted  that  a*  (k)  can  easily  be  evaluated  at 


2 


each  time  k,  since  quantities  o (k-I.),  i=l,2,....,M  are 

1 


available  to  the  predictor  for  each  k. 


CHAPTER  5 


FILTERING 


This  chapter  develops  the  derivation  of  the  final  step 
of  the  proposed  estimation  technique,  where  block  F of 
Fig.  2.2  will  be  analyzed  and  defined.  In  Section  5.1, 
the  information  obtained  from  the  linear  predictor  is 
incorporated  into  an  approximate  probability  density  func~ 
tion  (PDF)  for  x(k).  This  density  is  then  used  in  Section 
5.2  as  an  a priori  statistic  for  the  observation  y(k)  to 
derive  the  estimate  x(k)  and  its  error  variance  o (k)  . 


5.1  Choice  of  A Posteriori  PDF  for  x(k) 


Assuming  that  the  actual  prediction  variance  Op(k)  is 

computed  and  available  then  the  predicted  value  x* (k)  and 

its  variance  (k)  represent  the  mean  and  the  variance  of 
P 

the  a posteriori  PDF  on  x(k).  This  density  represents 
the  available  knowledge  of  the  random  variable  x(k)  prior 
to  the  reception  of  the  observation  y (k) . Since , for 
a given  mean  and  variance  the  normal  distribution  repre- 
sents the  maximum  uncertainty  (entropy)  [30],  this  density 
function  is  assumed  to  be  normal.  Further  uncertainty  is 
associated  with  x(k)  if  o*Mk)  is  used  in  place  of  o^(k). 


I 


Consequently,  an  approximate  and  a rather  conservative 
choice  of  an  a posteriori  probability  density  for  x(k)  is 

2 


p (x  (k) ) 


1 

o*  (k)  /2tT 


Ix (k) -X*  (k) ] 

exp{ 2 } 

2a*  (k) 


(5.1) 


5.2  Evaluation  of  the  Estimate  and  its  Variance 


The  density  p(x(k))in  (5.1)  is  used  as  an  a priori 
statistic  for  y(k)  in  order  to  obtain  the  Baye's  estimate, 
x(k),  of  x(k)  as  follows 

x(k)  =Ex(k)|y(k)  = /x (k) p (x (k) 1 y (k) ) dx (k)  (5.2) 


Using  the  Baye's  rule 

p(x(k),y(k))  p(y  (k)  |x(k)  )p(x(k)  ) 

p(x(k)|y(k))  = = 

P(y(k))  P(y(k)) 

(5.3) 


Therefore 


- 1 

x(k)  = /x(k)p(y  (k)  |x(k)  )dx(k)  (5.4) 

p (y  (k) ) 

Using  the  same  procedure  as  outlined  in  Section  2.1,  it 
follows  that 

p(y  (k)  )=Jp(y  (k)  ,x(k)  )dx(k)=/p(y  (k)|  x(k)  )p(x(k)  )dx(k) 

(5.5) 
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Therefore,  the  estimate  x(k)  is  given  by 

/x(k)p(y(k)  |x(k)  )p(x(k)  )dx(k) 

x(k)  = ^ (5.6) 

Jp(y(k)  |x(k)  )p(x(k)  )dx(k) 

. « ^2 
Similarly,  the  error  variance,  o (k) , is  obtained  by 

o'  (k)  = E[x(k)-x(k)  ] ' |y  (k) 

= Ix(k)-x(k) ] ^p(x (k) |y(k) )dx(k)  (5.7) 

Again  by  using  the  identity  of  (5.3),  (5.7)  becomes 


2 

0 (k) 


p(y  (k) ) 


/[x(k)-x(k)]  p(y(k)  | x (k)  ) p (x  (k)  ) dx  (k) 


(5.8) 


The  substitution  of  (5.5)  in  (5.8)  results  in 


2 

.2  /(x(k)-x(k)]  p(y  (k)  |x(k)  )p(x(k)  )dx(k) 

o (k)  = — 

/p(y  (k)  |x(k)  )p(x(k)  )dx(k) 


(5.9) 

In  (5.6)  and  (5.9),  p(x(k))  is  given  by  (5.1)  and 
p(y(k) |x(k) )is  obtained  from  that  part  of  the  a priori 
information  which  describes  the  observation  system  struc- 
ture and  the  probability  density  of  the  observation  noise 


(equation  (1.7) ) . 

The  comparison  of  equations  (5.6)  and  (5.9)  with  those 
of  the  optimal  filter  given  in  (2.10)  and  (2.13)  indicates 
that  the  proposed  estimator  exhibits  the  same  logic  as  the 
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optimal  estimator.  This  logic  consists  of  each  estimate 
and  its  error  variance  being  a specific  function  of  the 
past  and  present  information  where  p(x(k))  and  p(y(k) lx(k)) 
represent  these  two  quantities,  respectively.  In  fact, 
had  the  prediction  been  done  optimally,  the  proposed  pro- 
cedure would  have  been  optimal. 

/\  /»  2 

The  integrals  involved  in  evaluating  x(k)  and  o (k) 
in  (5.6)  and  (5.9)  may  or  may  not  have  analytic  solutions. 

If  such  solutions  exist,  then  the  computational  require- 
ment of  the  procedure  is  reduced  tremendously.  If  such 

solutions  do  not  exist  then  these  integrals  can  be  evalu-  i 

ated  numerically.  This  in  turn  allows  the  procedure  to  be 
applicable  to  a broad  class  of  observation  systems  includ- 
ing nonlinear  forms  of  y(k).  Chapter  6 contains  examples 

demonstrating  and  substantiating  these  properties.  i 

The  estimator  developed  so  far  is  both  feasible  and  i 

I 

implementable . Its  feasibility  is  due  to  the  structure 

of  Fig.  2.2,  which  leads  to  (5.6)  and  (5.9)  being  scalar  jj 

operations.  | j 


CHAPTER  6 


DERIVATION  OF  A FEW  SPECIFIC 
ESTIMATORS  AND  APPLICATIONS 

6.1  Introduction 

Based  on  the  general  estimation  procedure  as  developed 
in  Chapters  2 through  5,  a number  of  specific  estimators 
for  various  observation  systems  are  derived  in  this  chap- 
ter. Each  estimator  then  is  utilized  in  estimating  a 
number  of  noisy  images  as  examples  of  the  applicability 
of  the  procedure.  The  procedure  is  almost  exclusively 
applied  to  two  dimensional  pictorial  data  and  its  appli- 
cability to  similar  one  dimensional  problems  is  implied 
implicitly.  Section  6.2  contains  a discussion  on  the 
methods  of  finding  and  specifying  two  dimensional  a priori 
correlation  functions.  In  Section  6.3,  a linear  estimator 
is  derived  by  applying  the  estimation  procedure  to  the 
case  of  additive-Gaussian  observation  noise.  The  pro- 
perties of  this  linear  estimator  are  outlined  and  dis- 
cussed in  more  detail  in  Section  6.4.  Section  6.5  con- 
tains the  derivation  of  the  estimator  for  observations 
having  bounded  multiplicative  noise.  This  section  also 
includes  the  application  of  the  procedure  to  two  dimen- 
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sional  pictorial  data  corrupted  by  uniform  multiplicative 
noise.  Finally  in  Section  6.6,  the  application  of  the 
procedure  to  observations  containing  multiplicative  and 
additive  noise  terms  is  considered. 


6.2  Two  Dimensional  A I'riori  Statistics 


In  applying  the  foregoing  estimation  method  to  the 
noise  corrupted  two  dimensional  pictorial  data,  the  know- 
ledge of  the  a priori  mean  and  at  least  a few  values  of 
the  autocorrelation  function  of  the  image  is  reguired. 
These  two  quantities  are  defined  by  (1.1)  and  (1.2).  If 
the  image  is  a member  of  a stationary  two  dimensional  ran- 
dom process,  then  they  are  defined  to  be 


M(i,j)  =M=Eb(i,j) 


(6.1) 


R(i,  j,k,l,)  = R(  |i-k  |,  lj-1  1)  = E[b(i,j)-M]  [b(k,l)-M] 

(6.2) 

where  E represents  the  ensemble  averaging. 

Experimental  results  indicate  that  random  fields  with 
exponential  autocorrelation  functions  are  realistic  models 
for  a variety  of  pictorial  data  [2]  - [5] . Two  widely 
used  forms  of  these  functions  for  stationary  processes  are 


R(  [i-k  1,  jj-l  |)  = o exp[-a j |i-k  ]-a2  |j-l  11 


(6.3) 
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I 

J 


and 


1 I I I * / ^ 2 2 2 

R( 1 1-k  I , I j-1 1 ) = a^exp[-/a, (i-k)  +a2(j-l)  ] (6.4) 

2 

Where  is  the  signal  power  and  | i-k  | and  lj-l|  are  the 
increments  in  the  verticle  and  horizontal  directions, 
respectively.  Although  these  correlation  functions  are 
both  of  the  exponential  decaying  type,  they  exhibit  dis- 
similar characteristics  in  that  the  separable  correlation 
function  of  (6.3)  assumes  more  correlation  in  the  hori- 
zontal and  vertical  directions  while  the  nonseparable 
function  in  (6.4)  indicates  a smooth  and  rotationally 
invariant  correlation  in  all  directions.  Figures  6.1  and 
6.2  represent  two  views  of  the  three  dimensional  graphs 
of  these  functions.  It  is  suggested  in  [1]  that  pictures 
of  the  natural  scenes  exhibit  nonseparable  and  rotationally 
invariant  correlations  while  the  images  of  man-made  objects 
correspond  to  separable  autocorrelation  functions. 

The  complete  definition  of  the  correlation  function  in 
(C.3)  or  (6.4)  requires  the  specification  of  the  three 
quantities  o^,  Uj  and  ot^-  Note  that  if  any  three  correla- 
tion values  (for  example  R(0,0),  R(0,1),  R(1,0))  are  known 
then  these  quantities  can  be  obtained. 

For  the  processed  images  in  this  dissertation,  the 


approximate  values  for  the  mean  M and  correlations  R(0,0), 
R(0,1)  and  R(1,0)  are  obtained,  wherever  necessary,  from 


(6.5) 


1 N 


M = ~ I 
N i=l 


N 

I b(i, j) 
j=l 


1 N-p  N-q 

R(p.q)=-r  I I [b{r,s)-M] [b{r+p,q+s)-M] 

N r=l  s=l 


(6.6) 


where  b(i.j)  is  the  intensity  value  of  the  original  image 
at  pixel  (i,j)  and  N represents  the  size  of  the  image. 

6.3  Additive-Gaussian  Observation  Noise 


In  the  case  where  the  observation  is  given  by 


y (k)  = x(k)+Y (k) 


(6.7) 


with  v(k)  normal  and 


EY(k)  = 0 


(6.8) 


EY(i)Y(l)  =\  2 

0 (i) 
Y 


if  i # j 
if  i = j 


(6.9) 


then 


p (y  (k)  1 X (k)  ) = 


/2ia^ (k) 


(y(k)-x(k)) 
expl-  i 1 

20y (k) 


(6.10) 


Lemma  6.1:  The  estimate  x(k)  and  its  error  variance 

G'(k),  lor  this  case  of  additive-Gaussian  observation  noise 
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are  given  by 


o*Mk) 


a‘ (k) 


x(k)  = — y(k)  + 

o*Mk)+o2(k)  o*Mk)+o2(k) 


X*  (k) 


(6.11) 


oMk) 


0*  ^ (k)  Oy  (k) 
0*  ^ (k)  + a^ (k) 


(6.12) 


Proof:  Substituting  (6.10)  in  (5.6)  and  (5.9)  and  drop- 

ping the  time  dependence  k from  all  variables,  then  the 
estimate  x and  its  error  variance  at  time  k are  given 


(y-x) ^ (x-x*)^ 
x exp(-  )dx 


(6.13) 


(y-x)  ^ (x-x*)^ 

exp[-  ~ 


(y-x)  ^ (x-x*)^ 

(x-x)^exp[-  - ; 

2o^  2o*^ 

Y 


(6.14) 


(y-x) ^ (x-x*)^ 

exp[-  “ ” 


i«i  ^ - * 


^ 


_ 2 2 

, 2 2 2 2 i ^ ^ 1 rt  ^ \ V -2(o*  y + o.x*)x 

(v-x)  (X-X*)  a*  y +o^x*  +(o*  +07)^  ^ £ — L 

a;  o*^ 


22  * ^ 

Oy+O*  2 O* 

z— — [x  -2  ( 2 2^^ 


^ 2 ^ 2 


O ^ Y 2 . 

__JL-  X*)  j+^y 

00^  2 2 2 '§C  ^ 

a*  +Oy  0*  +Oy  o 


2 / 


a 1 = — ? 72 

a*  Xo-y 


/'  (X  2 ~ 2 2 

0*  +0 


2 * 2 

2 _ Qy  ^ 

5 22 

O*  +Oy 


then 


(y-x)  ^ (x-x*)^ 
a!  ^ o*" 


1 2 1 2 ^ 2 ^ ^2 

= _[x-(aiy+«2X*)l  - -7<«^y+«2X*)  + ' _*2^ 


o’ 


(6.17) 


letting 


111 

q = — vH  x*^-  — (a-y+a^x*) 

0^  0*' 

then  (6.17)  becomes 

^ lx-(a,y.<.,x*)l'^q  <6.18) 

o“  o*"  5’ 

Y 

Substituting  (6.18)  in  (6.13)  and  (6.14)  and  noting  that 
a , a , and  q are  independent  of  x,  yields 


X 


[x- (a jy+ttjX*) ] ^ 

exp[- ]dx 

2^' 


[x- (a jy+a^x*) ] ^ 
exp[-  ]dx 

2^' 


(6.19) 


^ lx(a jy+tt2X*)  ] ^ 

(x-x) ^exp[-  — ]dx 


0 


2 


[x- (a ,y+tt2X*) ] ^ 
exp[-  — ]dx 

2C' 


(6.20) 


Canceling  the  common  terms  from  the  numerator  and  the 
denomenator  of  (6.19)  and  (6.20)  and  realizing  that 

1 (oo  [x- (a jy+tt2X*)  ] ^ 

1 exp[-  ]dx  = 1 

^/2tt]-«>  2^^ 
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then 


1 [x- (a jV+ajX*) ] ^ 

X = X exp[-  ]dx 


1 /oo  ^ [x-(a  jY+ttjX*)  ] ^ 

(x-x)^exp[-  ]dx 


But  these  two  relationships  indicate  that  x and  are  the 
mean  and  variance  of  the  Gaussian  density  function 


1 [x- (aiY+a2X* ) ] ^ 

p(x)  = exp[ ] 

5/2?  2K^ 


therefore 


X - ttjy+a2X*  = 


0*2 

y + X X* 

0*2 +0,2  0*^  + 0?, 


q2=  = 


OyO*^ 


0*  +0' 


This  completes  the  proof. 

This  lemma  indicates  that  the  estimation  process,  in 
this  case  of  additive-Gaussian  noise,  involves  obtaining 
x*(k)  and  0*2  (k)  from  (4.22)  and  (4.33)  and  inserting  them 
in  (6.11)  and  (6.12)  to  find  the  estimate  and  its  error 


variance  at  each  time  k, 
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Figures  6.3  and  6.4  describe  the  application  of  the 
foregoing  linear  estimation  method  to  two  dimensional 
pictorial  data.  In  these  figures,  a stationary  nonsepa- 
rable  co’-relation  of  the  form 

R(  I i-k|  , I j-1 1 ) = Ex(i,j)x(k,l) 

= a^exp[-,4^2  ( j-1)  M (6.26) 

was  assumed  for  the  originals  in  Fig.  6.3(a)  and  Fig. 
6.4(a).  From  (6.6),  the  three  correlation  values  which 
are  used  in  finding  a , a ^ , and  a^i  were  found  to  be 

R(0,0)  = 1816 
R(0,1)  = 1807 
R(1,0)  = 1797 

Using  the  modeling  procedure  the  best  7th  order  autore- 
gressive model  was  obtained  as 

x(i,j)  = 0.87x(i, j-l)+0.02x(i, j-2)-0.03x(i-l, j-2) 
+0.01x(i-l, j-l)+0.03x(i-l, j )+0.03x(i-l, j+1) 
+0.13x(i-l, j+2)+0.29u(i, j)  (6.27) 

Note  that  with  the  above  model,  the  estimator  requires 
only  the  storage  of  the  current  and  the  previously  esti- 
mated line  of  the  image  at  each  time  k.  Based  on  the 
availability  of  two  lines  of  image,  at  each  time  k,  and 
the  use  of  the  guidelines  of  the  modeling  procedure,  the 


i 
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(a)  Original 
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(b)  Noisy,  SNR=l/2  (c)  Estimate 

Fig.  6.4  Additive-Gaussian  Observation  Noise 
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7th  order  model  of  (6.27)  was  found  to  be  ddequate. 


Figures  6.3(b)  and  6.4(b)  represent  noisy  images, 
where  the  noise  is  white  additive-Gaussian  with  the  signal 
to  noise  ratios  of  one  and  one-half  respectively.  The 
estimated  image  of  Fig.  6.3(c)  represents  a 7.54  and 
Fig.  6.4(c)  represents  an  8.4  db  improvement  (see  Appendix 
B for  the  definition  of  db  improvement) . 

6.4  Properties  of  the  Linear  Estimator 

For  an  autoregressive  model  of  the  form 
M 

x(k)  = I 3.x (k-I . )+Bu(k)  (6.28) 

i=l  ^ ^ 

when  the  observation  noise  is  additive-Gaussian,  then  the 
estimate  and  its  error  variance,  at  each  time  k are 
given  by 


x(k) 


q;(k) 


0*  ^ (k) +o^ (k) 


X*  (k)+- 


(k) 


(k) +o^ (k) 


•y  (k) 


(6.29) 


o"(k)a*Mk) 

a2  (k)=  — ^ 

a*  ^ (k) +a^ (k) 


(6.30) 


where 


64 


(6.31) 


x*(k)  = I 0.x(k-I.) 

i=l  ^ 

M 

a*2(k)  = B^'+I  I |0  la(k-Ii)]2  (6,32) 

i=l  ^ 

and  (k)  is  the  observation  noise  variance  at  time  k. 

Relations  (6.29)  through  (6.32)  indicate  that  imple- 
mentation of  the  linear  estimator  is  quite  simple  leading 
to  on  line  processing  of  either  images  or  one  dimensional 
signals.  An  encouraging  property  of  this  estimator  is 
that  the  error  variance  (k)  satisfies  the  set  of  recur- 
sive equations  (6.30)  and  (6.32)  which  are  independent  of 
the  value  of  the  observation  y(k).  This  enables  the  imple- 
mentation of  (6.30)  and  (6.31)  on  a digital  computer, 
prior  to  receptiop  of  any  observations,  and  the  investi- 
gation of  the  steady  state  behavior  of  the  error  variance. 
Since  this  estimation  method  is,  in  general,  suboptimal, 
as  will  be  discussed  in  Chapter  8,  the  above  a priori 
analysis  could  be  used  as  a basis  for  deciding  for  or 
against  the  use  of  the  foregoing  estimation  technique 
even  when  the  optimal  solution  exists.  Of  course,  the 
advantage  of  the  use  of  this  procedure  when  the  optimal 
solution  exists  is  the  computational  and  implementation 
simplicity  of  this  estimation  algorithm.  This  is  shown 
in  more  detail  in  the  following  example. 
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Example  6 . 1 : 


Consider  the  one 


dimensional  random  proce 


ss 


itoregressive 


x(k)  satisfying  the  following  au 

x(lc)  = 0.248x(k-l)+0.014x(k-2)+0.969u(lO 


model 


(6.33) 


Por  the  observation 


y (k)  = X (k)  +Y 


(6.34) 


with 


(6.35) 


variance  of  the  one  step  predicted  value  can  be 
the  error  variance  ot 

Obtained  from  the  two  recursive  relationships 


..2, VI  = ,n.969)“+I0.248o{k-l)+0.014o(k-2)l" 


If..  -^6) 


and 


oMk) 


0. 5o*^ (k) 

0.5+0*^ (k) 


(6.37) 


raiiic.  nf  o*^(k),  denoted  by  o*^ 
-pvaxa  r-(-»nverainq  value  or  o k i 


IC. 


is  found  to  be 


0*2  = 0.959 


(6.38) 


The  estimation  problem  as  defined  by  (6.33) 
,e.35)  can  also  be  done  optimally  by  defining  a 

vector  Z(k)  [221  as 


through 

random 
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z (k) 


Zi  (k) 
Z,  'K) 


with 


(6.39) 


Zj  (k)  = x(k-l)  (6.40) 

and 

Z^(k)  = Zj  (k-1)  = x(k-2)  (6.41) 

Using  (6.33),  (6.40)  and  (6.41),  the  elements  of  Z(k+1) 

can  be  written  as 


Zj(k+1)  = x(k)  = 0.248Z, (k)+0.014Z2 (k)+0.969u(k) 

(6.42) 

Zj  (k+1)  = Zj  (k)  (6.43) 

or  with  the  use  of  (6.39)  in  a vector  form  as 


• 

0.248 

0.014' 

0.969 

z(ktl)  = 

Z(k)  + 

1 

0 

0 

Similarly  the  observation  at  time  k-1  can  be  written  as 

y(k-l)  = [1  0]Z(k)+Y(k-l)  (6.45) 

Equations  (6.44)  and  (6.45)  have  the  same  form  as 


(1.14)  and  (1.15),  thus  the  Kalman  filte;.ing  technique  is 
applicable  and  the  optimal  estimates  can  be  obtained  from 
(1.18)  through  (1.20)  with 


6" 


A(k) 


B(k) 


C(k) 


D(k) 


0.248 

1 

0.969 

0 

I- 

= (1  0] 

L(k)D'(k)  = 


0.014 

0 


1 


K(k)  = 1 

Denoting  the  convergent  value  of  the  variance  of  the  opti- 
mal one  step  prediction  value,  obtained  from  implementing 
(1.19)  and  (1.20),  by  o^,  this  value  was  found  to  be 

oj  = 0.957  (6.46) 

Comparison  of  (6.38)  and  (6.46)  reveals  that,  if  the 
third  decimal  place  accuracy  is  negligible,  then  the 
proposed  method  should  be  used  for  estimation  since  this 
procedure  is  easily  implementable  and  does  not  require 
matrix  operations.  It  should  be  noted  that  a second  order 
model  was  considered  in  this  example,  in  order  to  outline 
the  desired  properties  with  the  least  amount  of  notational 
complexity.  The  computational  simplicity  of  the  proposed 
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estimator  will  become  exceedingly  attractive  when  the 
autoregressive  model  is  of  a much  higher  order,  in  which 
case  the  optimal  procedure  would  require  operations  on 
large  dimensional  matrices. 

6.5  Bounded  Multiplicative  Noise 


When 


Y(k)  = y(k)  [x(k)+M(k)l 


(6.47) 


and  i'  (k)  has  a density  function  p (y  (k)  ) bounded  between 


Yj  (k)  and  72^^^  with  0<y j (k) <Y2 (k) , then  [26] 


y(k) 

p [ ] 


y 


x{k)+M(k)  y(k) 

if  Yi(k).$: <Y2(k) 


p(y{k)  lx(k) ) =< 


x(k)+M(k) 


X (k)  +M  (k) 


Otherwise 


(6.48) 


For  images,  the  quantity  x(k)+M(k)  designates  the  intensity 
of  the  original  image  at  pixel  k and  hence  it  is  always 
positive.  This  is  used  in  reducing  condition 
y(k) 


Yi  (k)  >$- 


X (k) +M (k) 


Y,  (k) 


in  (6.48)  to  a condition  on  x(k)  as 
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^ .*.41^  ^ 


y(k)  y(k) 

.— — -M(k)  ^<  x(k)  << M(k) 


y,  (k) 


y^  (k) 


Thus,  (6.48)  can  equivalently  be  written  as 


p(y  (k)  |x(k)  )=<  x(k)-*-M(k) 


y (k) 

p [ ] 

^ x(k)+M(k)  y(k)  y(k) 

if M(k)^x(k)^ -M(k) 


Y2  (k) 


Y,  (k) 


Otherwise 


Substituting  (6.49)  in  (5.6)  and  (5.9)  and  dropping  sub- 

/V 

script  k from  all  variables,  then  x and  for  each  k are 


obtained  from 


1 f bj  X y (x-x*) ^ 

- 1 Pv( )exp[ — ]dx 

G jb  x+M  ’ x+M  2o*^ 


, 1 bj  (x-x)^  y 

o^=  - p ( )expt- 

G I bi  x+M  ' x+M 


(x-x* ) ^ 


where 


b 1 y (x-x*)^ 

p ( )e.xp[-  ]dx 


b 1 x+M  ' x+M 


20*2 


b = — -M 


b,= M 


Thus,  computation  of  x and  involves  evaluation  of  the 
above  definite  integrals. 

V/hen  is  uniform  then 


(k)-yj (k) 


if  y 1 (k)^y (k)<y 2 (k) 


p(y(k))  =< 


(6.54) 


Otherwise 


and  (6.50)  through  (6.52)  become 


lb,  X 


(x-x*) 


■exp[- 


Hjbj  x+M 


(6.55) 


^2_ 


Hjbi  x+M 


)2  (x-x*)^ 

- exp[-  ]dx 

20*2 


(6.56) 


1 


(x-x*) ^ 

-exp[- ]dx 


bj  x+M 


(6.57) 


The  numerical  technique  of  evaluating  (6.55)  through  (6.57) 
is  presented  in  more  detail  in  Chapter  7. 

Figures  6.5  through  6.8  represent  the  application  of 
the  foregoing  estimation  process  to  images  containing  uni- 
form multiplicative  noise.  Fig.  6.5(a)  represents  the 
same  image  as  in  Fig.  6.3(a),  therefore  it.,  autoregressive 
model  was  chosen  to  be  the  same  as  in  (6.27). 

The  binary  square  picture  of  Fig.  6.6(a)  and  Fig. 
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6.7(a)  are  32x32  images  viith  a background  intensity  of  10 
and  a foreground  intensity  of  20.  The  shaded  square  pic- 
ture of  Fig.  6.8(a)  is  again  a 32x32  image  with  intensity 
values  of  10,  15  and  20.  The  three  correlation  values  of 
the  square  picture  were  found  to  be  R (0 , 0) =18 . 75 , R(0,1)— 
16.99  and  R (1,0) =16. 99  and  those  of  the  shaded  square  were 
R(o,o)=12.1,  R(0,l)=11.08  and  R (1 , 0) =11 . 08 . 

A stationary  nonseparable  correlation  function  of  the 
form  (6.4)  was  assumed  for  these  images  and  the  following 
models  were  obtained; 

a)  For  the  square  picture  the  best  4th  order  model 
was  found  to  be 

X ( i , j ) =0 . 48x (i , j-1 ) +0 . 27x ( i-1 , j ) +0 . 18x ( i-1 , j+1 ) 

+0.07x(i-l, j+2)+0.46u(i, j)  (6.58) 

b)  For  the  shaded  square  the  best  4th  order  model 
was  given  by 

x(i, j)=0.48x(i, j-l)+0.27x(i-l, j)+0.18x(i-l, j+1) 

+0.07x(i-l, j+2)+0.32u(i, j)  (6.59) 

Table  6.1  summarizes  the  result  of  the  application  of 
the  estimation  pro'^edure. 
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(a)  Original 


(b)  Noisy,  noise=0. 7-1.0 


(c)  Estimate 


Fig.  6.5  Uniform  Multiplicative  Noise 
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(a)  Original 


(b)  Noisy,  noise=0 . 7-1 . 0 (c)  Estimate 


Fig.  6.6  Uniform  Multiplicative  Noise 


(a)  Original 


(b)  Noisy,  noise=0 . 4-1 . 0 


(c)  Estimate 


Fig.  6.7 


Uniform  Multiplicative  Noise 


IMAGE 


FIGURE 


NOISE  BOUNDS 


db  IMPROVEMENT 


h 

i: 


•V 

h 


Girl 


Square 


Square 


Shaded  Square 


6.5 


6.6 


6.7 


6.8 


0. 7-1.0 


0. 7-1.0 


0. 4-1.0 


0. 6-1.0 


5.48 


7.58 


7.72 


7.70 


Table  6.1 


Aside  from  the  quantitative  improvement,  as  indicated 
in  Table  6.1,  note  the  preservation  of  edges  in  the  esti- 
mated images  of  Fig.  6.5(c)  through  Fig.  6.8(c),  which  is 
a measure  of  subjective  improvement.  The  responsiveness 
of  the  estimator  to  abrupt  pixel  to  pixel  intensity  changes 
is  due  to  the  estimator  structure  of  Fig.  2.2,  since  it  is 
this  structure  that  allows  the  estimator  to  respond  to 
observation  nonlinearities . 


6.6  Observations  Containing  Additive  and  Multiplicative 
Noise  Terrs 


In  the  case  that  the  observation  is  of  the  form 


y(k)  = Y (k)  [x  (lt)+M(l<:)  ] +v  (k) 


(6.60) 


then  the  application  of  the  estimation  method  requires  the 
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derivation  of  the  density  function  p (y (k) 1 x (k) ) . This 
density  function,  then  can  be  substituted  in  (5.6)  and 
(5.9)  to  determine  the  estimate  and  its  error  variance 
at  each  time  k. 

Assuming  that  y (k)  and  v(k)  in  (6.60)  are  independent, 
the  conditional  density  p(y(k)|x(k))  can  be  obtained  in 
terms  of  the  convolution  of  the  probability  density  func- 
tions of  Y v(k)  [26].  This  is  achieved  by  noting 

that  conditioning  of  the  right  hand  side  of  (6.60)  on 
x(k)  will  make  the  quantity  x(k)+M(k)  nonrandom,  hence  [26] 

1 foo  y(k)-^ 

p(y  (k)  lx(k)  )= I p ( )p  (^)d^ 

[x(k)+M(k)  |)-oo  ^ x(k)+M(k)  ^ 

(6.61) 

In  order  to  outline  the  procedure  for  solving  (6.61), 
let  us  assume  that  y(k)  and  v(k)  are  both  uniformly  dis- 
tributed with 

1 

if  0<Yj (k)<y (k)$Y, (k) 

Y^ (k)-Yj (k) 

P (y(k))=- 

y 

0 otherwise  (6.62) 

and 
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a 


• Jf 


V,  (k)-Vj  (k) 


if  Vj  (k)^5:v(k)<?V2  (k) 


Otherwise 


(6.63) 


Dropping  the  time  dependence  k from  all  variables  and  again 
assuming  that  x+Mi>0  at  each  pixel  (which  is  the  case  for 


images)  then, 


y-^ 

P ( — )=<\ 
y x+M 


y-^ 

if  2 

x+M 


Otherwise 


(6.64) 


or 


if  y-Yz  (x+M)^C<y-Yi (x+M) 


Y,-Y 


y-^  , 

p ( )=^ 

^ x+M 


2 ' 1 


Otherwise 


(6.65) 


and 


if 


V^-Vj 


P (5)=S 

V ' 


Otherwise 


(6.66) 


y-^ 


In  order  to  find  the 


product  of  p.^  ( ) Py(^^' 


x+M 


J 
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required  to  know  whether 


v^-V,  .<  (x+M)  (Ta-Y,  ) 
or 

V^-V  I ^(x+M)  J 


(6.67) 


(6.68) 


The  necessity  of  this  requirement  can  easily  be  substan 
tiated  by  trying  to  find  the  density  function  of  the 
random  variable  z,  where 


z = a+b 

where  a and  b are  uniformly  distributed  between  a^,  and 
bj , b^,  respectively. 

Assuming  v is  the  dominant  noise  term  in  (6.60),  i.e. 
(6.68)  hold  at  each  pixel,  and  by  inserting  (6.65)  and 
(6.66)  in  (6.61)  and  carrying  out  the  integration  in  terms 

of  then  p(yjx)  is  given  as 


if 

y-v,  y-v 2 

-[  -Y2I 

A x+M 

Y2 

1 

1 

< 

S3 

1 

< 

if 

M$X^ 

V2-V1 

1 y-v 

> 

1 

> 

1 

-[ Y,1 

if 

M$:<$ 

A x+M 

Y2 

0 

Otherwise 
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where 


A = ( Y Mv  -V  j) 

Relation  (6.69)  can  be  substituted  in  (5.6)  and  (5.9) 
to  obtain  the  pertinent  filtering  equations. 


CHAPTER  7 


COMPUTATIONAL  ASPECTS 


The  estimation  procedure  developed  in  Chapters  2 
through  5 has  been  mainly  motivated  by  the  ease  of  the 
implementation  and  computational  considerations.  Since 
the  method  consists  of  the  three  parts,  namely  modeling, 
prediction  and  filtering,  the  computational  requirements 
of  each  will  be  discussed  separately. 

To  find  the  model  of  the  random  process  x(k),  a 
series  of  systems  of  linear  equations  must  be  solved.  This 
does  not  hamper  the  speed  or  running  time  of  the  estimation 
process,  since  the  modeling  procedure  is  implemented  prior 
to  the  reception  of  any  observations.  The  numerical  meth- 
ods, used  in  this  work  for  solving  each  system  of  linear 
equation,  is  just  one  of  the  many  standard  available  meth- 
ods [40]-t42]. 

In  the  actual  operation  of  the  estimator,  the  predic- 
tion scheme  requires  a minimal  amount  of  computation.  This 
only  involves  arithmetic  operations  to  find  x*(k)  and  the 
upper  bound  of  the  variance  o*^(k),  equations  (4.22)  and 
(4.33).  Depending  on  whether  or  not  the  observation  noise 
is  additive-Gaussian,  the  filtering  step  may  represent  the 
bulk  of  the  computational  requirements  of  the  entire  esti- 
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mation  procedure.  In  the  linear  case,  again,  finding  x(k) 
and  0^ (k)  involves  simple  arithmetic  operations  and  the 
computational  aspects  of  the  general  case  is  considered 
below. 


7.1  Nonlinear  Filtering 


Dropping  the  time  dependence  k (to  reduce  rotational 
complexity)  from  all  variables,  the  pertinent  filtering 
eguations  will  become  (eguations  (5.6)  and  (5.9)). 


(x-x* ) ^ 


xp (y 1 x) exp[- 


2o 


* 2 


] dx 


X - 


(x-x*)^ 

p(y|x)exp[-  ]dx 


(7.1) 


2o 


*2 


J. 


(x-x) ^p(y lx)expl- 


(x-x*) 


)dx 


2o< 


(7.2) 


1 


(x-x*) ^ 

p(ylx)exp[-  ]dx 

2a*  ^ 


In  the  general  case,  where  (7.1)  and  (7.2)  do  not 
have  a closed  analytic  form,  the  integrals  should  be  evalu- 
ated numerically.  Although  (7.1)  and  (7.2)  suggest  that 
three  numerical  integrations  are  required  at  each  time  k, 
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the  following  expansion  of  (7.2)  can  help  in  reducing  this 
number  to  two.  An  example  of  this  is  the  case  of  the  uni- 
form multiplicative  observation  noise  of  Chapter  6,  derived 
in  Section  7.2. 


a^=- 


(x-x*) 2 

/P(y|x)exp[-  ] 

2a*  ^ 


- {/[x^-2xx+xMp(y  |x)exp[ ]dx} 


2o*2 


(x-x*) ^ 

/x^p(y  |x)e>;p(-  ]dx 

2o*^ 


(x-x*) 2 

/p(y|x)exp[-  Jdx 

2o*  ^ 


(7.3) 


7.2  Uniform  Multiplicative  Noise 

Having  developed  the  estimator  for  the  case  of  bounded 
multiplicative  noise,  the  detailed  expansion  and  the  meth- 
ods of  evaluation  of  the  integrals  in  (7.1)  and  (7.2)  for 
the  particular  case  of  uniform  multiplicative  observation 
noise  is  presented  in  this  section.  Therefore,  when 

1 

p(y|x)  = 

(Yj-Yi ) (x+M) 

then  (7.1)  and  (7.2)  become 
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ID  X (X-X*) 

exp[-  

a x+M  2a* ^ 

X = — 

'b  1 (x-x*) 

exp  (- 

a x-M  2o*^ 


b x^  (x-x*) 

exp(-  

a x+M  2a* ^ 


'b  1 (x-x*) 

exp  ( 

J a x+M  2a*  ^ 


where 

y 

a = — -M 
Y2 


Let 


I 


1 


'b  1 (x-x 

exp [ 

J a x+M  2a* 


Letting 


z = x+M 


results  in 


(z-M-x*) ^ 


b X (x-x*)"  fb+M 

]dx  = 1 exp[ -Idz 


”exp[- 


a x+M 


2a* 


la+M  z 


2a* 


•I 


(x-x*)  2 


exp[-  

a 2a* 


(x-x* ) ^ b 1 

]dx-M  expt-  - 

2 a x+M  2a*  ^ 


] dx 


(7.1 


nd 


b X' 


-exp  [ - 


a x+M 


(x-x*) ^ 

b+M 

] dx  -- 

2o*^ 

a+M 

M' 


(Z-M-x*) ^ 


[Z-2M+ — ]exp[-- 


■1 


2a*' 


■! 


(x-x* ) ^ 


b (x-x*)^  b 

(x+M) exp f-  ]dx-2M  exp[-  — 

2 a 2a* 


-]dx 


2a* 


+M  = 


(b  1 (x-x*)=^ 

xp[-  ]dx 


(7, 


a x+M  2o* 


Let 


Ab  (x-x*)^ 

I2  = exp[-  --  Idx 
Ja  2o*^ 


(7. 


Using  (7.7)  and  (7.12),  then  relations  (7.10)  and  (7.11) 


become 


) a x+M 


2a*‘ 


b (x-x*)^ 

exp[-  l<^x 

a x+M  2a* ^ 


(x-x*) ^ 

exp[-  Idx-MIj+M^I 

2a*  2 


(7. 


Furthermore,  with  a change  of  the  variable 


z = x-x* 
dz  = dx 


(7. 


then 


' b 
- a 


(x-x*) 2 

X exp(- ]dx 

20*2 


z2  fb  (x-x*) 2 

exp[- ]dz+x*|  exp[-  ]dx 

20*2  ja  20*2 


(7 


But 


fb-x*  z"  , (a-x*)'  (b-x*) 

z expl- ld--0*Mexpl-  — 1-expl- 


-V* 


a-x 


20*' 


20* 


20* 


(7 


Letting 


(a-x*) 


(b-x*) 2 


Q = o*2{expi-  ]-exp[- 


''0-2 


-]} 


(7 


then  using  (7.16)  relation  (7.14)  can  be  written  as 


j b (x-x* ) ^ 

expl- ]dx  = Q+(x*-M)I  +M=^I  (7.19) 

J a x+M  2a* ^ 

Substituting  (7.13)  and  (7.19)  in  (7.4)  and  (7.5)  results 
in 

X M (7.20) 

Q+(x*-M)l2 

= +M^-x2  (7.21) 

I, 

where  I,,  and  Q are  given  uy  (7.7),  (7.12)  and  (7.18), 
respectively. 

/V  /V  - 

Thus,  finding  x and  a^  requires  carrying  out  the  two 
integrations  1^  and  Since,  the  functions  involved  in 

these  integrations  are  continuous  and  well  behaved,  the 
Romberg's  method  of  numerical  integration  was  used  [12]  to 
determine  the  estimates  of  32x32  images  in  Fig.  6.6(c) 
through  Fig,  6.8(c).  This  integration  method  is  reasonably 
fast  and  requires  minimal  coding.  For  the  images  in 
Fig.  6.6  through  Fig.  6.8,  the  total  CPU  time  of  setting 
up  the  iaage,  the  observation  and  obtaining  the  estimated 
image,  was  about  2.5  minvites  on  PDP-10  computer. 

Although  the  CPU  time  of  2.5  minutes  for  the  32x32 
images  of  Figs.  6.6  through  6.8  is  reasonable,  this  time 
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h 


f 

I 


I 


J 


increases  linearly  with  the  increase  in  the  size  of  the 


image.  Such  an  increase,  in  general,  may  not  be  tolerable 
in  practical  situations.  Therefore,  for  images  of  the  size 
256x256  (such  as  those  in  Fig.  6.5)  or  higher,  tli-i  appli- 
cation of  the  method  ma”  warrant  some  approximation  of 
the  integrals  Ij  and  Ij.  For  small  values  of  the  predic- 
tion va’^iance  o*^,  a crude  approximation  on  the  exponential 
form  of  p(x(k))  can  be  made  by  expanding  the  exponential 
in  its  series  equivalent  and  retaining  the  first  twc  terms. 
This  results  in  representing  p(x(k))  at  each  time  k by 


2o*  1 

p(x)  = ( ) 

/2tT  2o*^+(x-x*)^ 


(7.22) 


With  this  it  can  be  shown  that  x and  are  obtained  from 


X = - 
A 


(7.23) 


g2  = -[q-2Bx+AxM 

A 


(7.24) 


where 


fl 

A = 

/ < 


p(y  |x)p(x)dx 


1 (x+M) ^ qj 

= [ In - tan 

2qj  2o*^+(x-x*)^  q,/q 


qj  -1  2(x+M)+q  x=b 


-]  (7.25) 

/q  x=a 
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-■J 


■i: 


b 2-1  2(x+M)+q2  x=b 

B = 1 xp  (y  I x)  p (x)dx  = [—tan  ] -MA 

a 


/q  x=a 


(7.26) 


Q = 1 x^p(y  lx)p(x)dx 
a 


1 q2  2 -1  2x+2M+q2  x=b 

M^A+[-ln(2o*='+(x-x*)M-(2M+— ) (— ) tan  — 1 

2 2 /q  x=a 


with 


qj  = 2o*^+(M+x*) 


q.  = -2(M+x*) 


q ^ 4 [2o*^+ (M+x*)  M-4  (M+x*) 


and 


a = 


y 

— -M 

y? 


b - 


y 

M 

Y 


1 


(7.27) 
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CHAPTER  8 


A DISCUSSION  ON  THE  OPTIMALITY  OF  THE 
PROCEDURE,  EXTENSIONS  AND  CONCLUSIONS 

8.1  Discussion  on  Optimality 

Due  to  the  various  restrictions  imposed  on  the  esti- 
mator in  Chapter  2,  the  estimation  procedure  as  developed 
in  Chapters  2 through  5 is,  in  general,  suboptimal.  An 
exception  to  this  is  the  case  considered  in  the  following 

theorem. 

Theorem  8.1;  If  the  observation  noise  is  additive-Gaus- 
sian  and  the  process  x(k)  is  a first  order  normal  Markov 
process,  then  for  given  initial  conditions,  x(0)  and  o(0), 
the  procedure  is  optimal. 

Proof:  In  this  case,  the  model  of  the  process  x(k)  and 

that  of  observation  are  given  by 

x(k)  = 6x(k)+Bu(k) 

y(k)  = x(k)+Y(k) 

In  order  to  show  the  optimality  of  the  procedure  in  this 
particular  case,  let  us  assume  that  the  estimate  x(k-l) 
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at  time  k-1  have  been  found 
and  its  error  variance  o (k-1)  at  time 


optimally!  i-e. 


x(k-i)  = Ex (k-1)  jy (1) » • • • * »y 


a (k-1) 


= E[x(k-l)-x(k-l) ] jy (1) , •/y(k-l) 


(8.3) 


(8.4) 


* ■it'?  error  variance  o*  (k)  are 

The  predicted  value  x* (k)  and  its  error  v 


obtained  from 


X* (k)  = 8x(k-l) 


2 2^2 


(k)  = B +8  0 (k-1) 


(8.5) 


(8.6) 


But  from  (8.1) 


Ex(k)|y(l) yC-I*  ' BEx(k-l)lyU) Ylk-D 

+ BEu(k) |y (1) » . • • »y (k-1) 

= 3Ex(k-l)  jy (1) » • • • »y 


(8.7) 


v/here  by  using 


(8.3)  and  (8.5),  it  follows  that 


Ex(k)  |y  (1)  , • • wy(k-l)  - 3x(k-l)  x*  (k) 


and  similarly 


E[x(k)-Ex(k)  ly(l) y(k-l)  ] |y(l)  ,.--y(k-l) 


2 2 /V  2 

= B +8  O (k-1) 


But  x*(k)  and  o*\k)  being  optimal  quantities  makes  the 
chosen  a posteriori  density  of  x(k)  in  Section  5.1  to  be 
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p(x(k))  = p (x(k) |Y (1) , . . . ,y (k-1) ) 


(8.10) 


and  the  substitution  of  (8.10)  in  (5.6)  and  (5.9)  results 


x(k)  = 


Jx(k)p(y  (k)  lx(k)  )p(x(k)  [y  (1)  , . . . ,y  (k-1) ) dx  (k) 
Jp(y(k) |x(k) )p(x(k) |y (1) , . . . ,y (k-1) ) dx (k) 


(8.11) 


^2  J[x(k)-x(k)]  p(y(k)  ]x(k)  )p(x(k)  |y  (1)  , . . ,y  (k-1)  )dx(k) 

o (k)=— 

/p(y(k)  |x(k))p(x(k)  |y(l),..,y(k-l))dx(k) 

(8.12) 

A comparison  of  (8.11)  and  (8.12)  with  (2.10)  and  (2.13) 
indicates  that  x(k)  and  SNk)  are  optimal  quantities. 

The  proof  of  the  theorem  follows  by  applying  the  above 

argument  for  k=l»2,.... 

In  the  derivation  of  the  general  estimation  procedure, 
the  value  of  the  prediction  error  variance  was  approximated 
by  its  upper  bound  in  Section  4.3.  As  stated  previously, 
this  approximation  was  introduced  in  order  to  maintain 
computational  simplicity  of  the  overall  algorithm,  but 
since  the  error  variance  is  an  indication  of  the  uncertainty 
of  the  value  of  each  estimate  then  the  effect  of  such  an 
approximation  on  the  values  of  the  subsequent  error  vari- 
ances should  be  investigated.  In  the  following,  the  effect 
of  such  an  approximation  on  the  error  variance  of  the 
linear  estimator  of  Section  6.3  is  analyzed  and  it  is 
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shown  that  the  introduction  of  the  upper  bound  causes  all 
future  computed  error  variances  to  be  larger  than  the 
actual  variances. 

At  each  time  k,  the  actual  unavailable  prediction 
2 

error  variance  o (k)  is  approximated  by  its  upper  bound 

2 P 

o*  (k)  where  (equations  (4.32)  and  (4.33)) 


o*^  (k)  = B^+[  I I 6. |o  (k-I. ) ]^ 
i=l  ^ ^ 


(8.13) 


with 


2 2 
o (k)  .<  0*  (k) 
P 


(8.14) 


The  computed  error  variance  of  the  filtered  quantity  is 
given  by 


2 2 

-2  0*  (k)o  (k) 

o (k)  = — y--— ' Tg 

0*  (k)+a  (k) 


(8.15) 


Z 

Assuming  (k)  is  available  at  time  k then  the  actual  vari- 


ance o (k)  would  be 
a 


2 t^p(k)o  (k) 

= -T V- 

Op(k)+o^ (k) 


(8.16) 


But  due  to  (8.14)  and  o^(k)  being  the  observation  noise 
variance,  then  it  follows  from  (8.15)  and  (8.16)  that 


(k)  <<  o (k) 

a 


(8.17) 
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Recursively,  the  substitution  of  (8.17)  in  (8.13)  shows 

2 

that  the  computed  prediction  error  variance,  a*  (k) » is 
always  larger  than  the  actual  variance.  Therefore,  for 
the  linear  estimator  of  Section  6.3,  all  computed  variances 
are  the  upper  bounds  for  the  actual  variances. 

8.2  Discussion  of  Nonlinear  Case 

In  the  light  of  the  discussions  in  the  previous  sec- 
tion, it  is  expected  that  the  computed  and  actual  variances 
of  the  nonlinear  estimator  behave  similar  to  those  of  the 
linear  case.  The  difficulty  in  showing  this  is  in  having 
o^(k)  satisfy  an  integral  relation  of  (5.9).  It  is  sus- 
pected, however,  that  the  relationship  between  the  computed 
and  the  actual  variances  of  the  linear  case  does  not  hold 
true  for  all  nonlinear  observations  while  for  a certain 
class  of  nonlinearities  the  same  results  may  exist. 

This  subject  requires  a more  exact  and  rigorous  analy- 
sis and  is  an  excellent  candidate  for  topic  of  future 
investigation . 

8.3  Extensions  and  Topics  for  Further  Research 

The  estimation  method  of  Chapters  2 through  5 has  been 
derived  based  on  the  restriction  of  the  linear  prediction. 
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But  if  a nonlinear  model  of  the  random  process  x(k)  exists 
Chen  the  prediction  procedure  could  be  replaced  by  the 
appropriate  nonlinear  one.  In  general,  if  the  form  of 
nonlinearity  is  known  then  the  modeling  procedure  can  be 
modified  in  order  to  determine  a nonlinear  model  of  the 
process  x(k).  This  modification  is  conciptually  simple 
and  would  require  a priori  decision  on  the  order  and  the 
degree  of  the  nonlinearity.  For  example,  for  a given 
order  M and  the  2nd  degree  polynomial  nonlinearity  the 
modeling  procedure  can  be  used  in  obtaining  a model  of 

the  form 

x(k)  = I 3.x  (k-Ii)+  I 

i=l  i=l 

(8.18) 

Of  course,  (8.18)  is  a particular  form  of  nonlinearity  and 
it  should  be  noted  that  in  order  to  apply  the  modeling 
procedure  the  a priori  statistics  must  contain  up  to  and 
including  the  3rd  moment  of  the  random  process  x(k) . 

8.4  Conclusion 

This  dissertation  has  examined  and  expanded  the  sub- 
ject of  general  image  estimation.  An  estimation  procedure 
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has  been  developed  with  a particular  emphasis  on  the  multi- 
plicative and  non-Gaussian  observation  noise.  An  analysis 
of  the  optimal  discrete  filter  has  been  presented  to  show 
that  the  principle  of  the  estimation  at  each  time  k consists 
of  a one  step  prediction  and  filtering  operations.  Concep- 
tually these  operations  are  shown  to  closely  resemble  a 
learning  procedure  based  on  the  past  information  and  the 
optimal  use  of  the  present  information.  Accordingly,  a 
recursive  estimation  procedure  is  derived  such  that  the 
logic  of  the  estimation  principle  is  maintained  and,  at 
the  same  time,  the  procedure  is  implementable. 

Although  the  derivation  and  application  of  the  method 
has  primarily  been  presented  in  terms  of  the  two  dimensional 
processes,  the  procedure  is  directly  applicable  to  one 
dimensional  problems.  A particularly  important  and  prac- 
tical feature  of  the  estimation  method  is  the  method's 
independence  of  the  analytic  representation  of  the  a priori 
correlation  function.  An  equally  significant  value  of  the 
procedure,  in  this  respect,  is  its  applicability  to  prob- 
lems where  only  partial  values  of  the  correlation  function 
are  specified. 

The  estimation  method  is  demonstrated  to  be  applicable 
to  a broad  class  of  observation  systems  and,  in  fact,  the 
degree  of  ease  or  difficulty  in  applying  the  method  to 
general  nonlinear  systems  is  directly  related  to  the  ease 
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appendix  a 


A DISCUSSION  ON  ERROR  VARIANCE 


For  a given  set  of  observation  yd) ,y(k).  the 

estimate  xW  of  the  process  x (k)  at  time  k is,  in  general, 
some  function  f of  the  observations  which  can  be  written  as 

x(k)  = f(yd), y(k)) 

2 /\  2 

Accordingly,  the  two  error  variances,  S,(k)  and  Oj(k),  of 


x(k)  can  be  defined  as 


A ? 


(k)  = E{ [x(k)-x(k) 1 } 


o,(k) 


= E{ [x (k) -X (k) 1 jy (1) » . • • • »y  ^ 


(A. 2) 
(A. 3) 


From  (A.l)  and  (A.2),  it  follows  that 

A 2 -'2 

0 J (k)  = E^O  2 (k) 

where  E^  represents  the  expectation  with  resp-^ct  to 
y(l), ,y(k).  Letting 


e(k)  = x(k)-x'.k) 

then  e(k)  is  a random  variable  and  (A.2)  and  (A. 3)  can 
equivalently  be  written  as 
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(A. 6) 


2 

(k)  = 

/...//e  (k)p(e(k),y(l), ,y{k))de{k)dy(l),...,dy{k) 

/N  2 2 

o^(k)  = /e  (k)p(e(k)  lY(l),...,y(k))de(k)  (A. 7) 

In  the  linear  optimal  case  of  Section  1.3,  the  exist- 
ence of  the  orthogonality  principle  (equation  (1.12)) 
along  with  the  zero  mean  «ind  the  Gaussian  nature  of  the 
random  variables  result  in  the  statistical  independence  of 
e(k)  and  y ( 1) y (k)  [26]  which  reduces  (A. 6)  and  (A. 7) 

to 

0j(k)  = 02<k)  = /e  (k)  p (e  (k)  ) de  (k)  (A. 8) 

but  in  general 

0 j (k)  ^02  (k)  (A. 9) 

/s  2 

For  a given  sample  function  of  the  observation, 02 (k! 

in  (A. 3)  specifies  the  amount  of  variation  (uncertainty) 

associated  with  choosing  x(k)  as  the  estimate  of  x(k)  at 

^ 2 

time  k.  The  quantity  0j(k),  on  the  other  hand,  represents 

/s 

the  ensemble  average  of  the  variance  when  finding  x(k)  from 
(A.l)  and  for  all  possible  values  of  y (1) , . . . . ,y (k) . 

Since  in  this  dissertation,  our  interest  is  in  a particular 
sample  function  of  the  observation  (the  degraded  image) , 
quantity  (A. 3)  is  taken  to  represent  the  variance. 
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APPENDIX  B 

AN  II-IPROVEMENT  MEASURE  FOR  ESTIMATED  IMAGES 

Letting  b(i,j),  y(i,j)  and  b(i,j)  denote,  in  order 

the  intensities  of  the  original  image,  the  noise  corrupted 

image  and  the  estimated  image  at  pixel  (i,j)»  then  the  two 

quantities  o and  o are  computed  as 
n s 

o'  = — 7 I I tb(i,j)-y(i.j)l 
N i=l  j=l 

Oe  = A ^ I [b(i.j)-b(i,j)l'  (B.2) 

N i=l  j=l 

where  N is  the  size  of  the  image. 

Viewing  and  as  the  average  error  variance 
n e 

associated  with  the  observation  and  the  estimate,  respec- 
tively, the  amount  of  improvement  in  db  is  obtained  from 

db  improvement  = 10  logjj 
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